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Existing  circuit  models  lor  short  channel  MOS  transistors  represent  a  compromise 
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Abstract 


Existing  circuit  models  for  short  channel  MOS  transistors  represent  a  compromise 
between  computation  speed  and  ease  of  use.  Empirical  models  are  very  fast  to 
evaluate,  but  their  parameters  must  be  fitted  from  experimental  measurements. 
Theoretical  models  require  longer  computation  time,  but  they  may  be  used  to  predict 
the  performance  of  new,  unmeasured  MOS  technologies  since  their  parameters  are  not 
curve-fitted  from  experimental  data. 

This  thesis  combines  the  best  features  of  both  types  of  model,  yielding  a  fast  circuit 
simulator  whose  input  parameters  need  not  be  extracted  from  experimental 
measurements.  A  nonlinear  optimization  algorithm  is  used  to  "compile"  the  parameters 
of  a  theoretical  model  into  parameters  for  an  empirical  model,  providing  the  superior 
user  interface  of  theoretical  models  without  sacrificing  simulator  execution  speed. 
Results  produced  by  a  prototype  model  compiler  are  presented,  showing  the  modeling 
error  to  be  approximately  5  percent. 
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Chapter  1:  Introduction 

One  of  the  major  challenges  in  integrated  circuit  engineering  is  to  minimize  the 
number  of  design  iterations  required  to  fully  debug  a  chip.  Error  tracing  is  made 
difficult  by  the  small  size  of  the  chip,  since  micro  probes  or  electron  microscopes  are 
required  to  monitor  the  behavior  of  internal  nodes.  When  errors  are  finally  located  and 
corrections  are  proposed,  they  are  both  expensive  and  time-consuming  to  implement: 
the  entire  chip  must  be  refabricated,  even  if  the  modification  affects  only  one 
transistor.  This  process  may  cost  up  to  $20,000  and  require  6  weeks  to  complete. 

The  best  way  to  avoid  design  revisions  is  to  debug  a  chip  before  it  is  ever 
fabricated:  usually  this  is  done  by  computer  simulation.  Simulation  makes  circuit 
behavior  easy  to  examine,  so  malfunctions  can  be  rapidly  located  and  repaired. 
Proposed  modifications  are  then  simulated  to  see  if  they  remove  the  error.  An  iterative 
process  of  redesign  and  re-simulation  is  performed  until  the  simulated  circuit  behavior 
is  acceptable. 

This  thesis  is  concerned  with  the  specific  task  of  MOS  circuit  simulation;  that  is, 
accurately  predicting  the  analog  waveforms  produced  in  a  network  of  MOS  transistors 
under  specified  input  excitations.  Simulation  accuracy  is  impossible  unless  the 
nonlinear  characteristics  of  the  individual  transistors  are  accurately  modeled.  Providing 
acceptable  modeling  accuracy  at  low  (simulation)  cost  is  the  goal  of  this  work. 

High  performance  is  obtained  in  modern  MOS  technologies  by  scaling  down  the 
transistor  dimensions  (11,  29].  Unfortunately,  as  they  become  smaller,  transistors 
deviate  significantly  from  the  behavior  predicted  by  classical  gradual-field  theories. 
These  "short  channel  effects"  drastically  complicate  the  physical  theory  of  transistor 
behavior,  and  hence  increase  the  complexity  of  transistor  models.  Two  distinct 
approaches  to  short  channel  transistor  modeling  have  been  successfully  employed  in 
circuit  simulators:  theoretical  models  (e.g.,  [13])  and  empirical  models  (e  g.,  [19]).  The 
terms  "theoretical"  and  "empirical"  are  new;  I  use  them  to  call  attention  to  the 


structural  difference  between  two  general  classes  of  MOS  models. 

Theoretical  models  are  created  by  performing  a  complete  solid-state 
electrodynamic  analysis  of  the  MOS  transistor  structure.  Each  identifiable  physical 
effect  is  explicitly  accounted  for,  in  an  attempt  to  eliminate  all  possible  sources  of 
error.  The  resulting  equations  are  then  directly  used  as  a  model  for  circuit  simulation. 

The  parameters  of  a  theoretical  model  are  simply  the  physical  attributes  of  a 
particular  MOS  technology  (e.g.,  oxide  thickness,  substrate  doping,  junction  depth, 
etc.).  These  constants  are  carefully  measured  and  closely  controlled  in  the 
semiconductor  fabrication  process.  Because  the  model  parameters  are  so  accurately 
known,  theoretical  models  are  extremely  easy  to  use:  physical  constants  of  the 
technology  are  plugged  directly  into  the  circuit  simulator. 

New  MOS  processes  are  also  readily  simulated  with  theoretical  models,  by  inserting 
target  values  of  the  physical  fabrication  parameters  into  the  simulator.  Circuits  can  be 
accurately  simulated  even  before  the  first  wafer  has  been  processed,  allowing  chip 
design  and  technology  development  to  proceed  in  parallel.  The  price  paid  for  this 
extreme  flexibility  is  slow  simulation.  Each  separate  physical  effect  is  modeled 
individually;  some  effects  require  transcendental  function  calculations  or  nested 
iterations.  When  all  of  these  calculations  are  combined,  the  final  modeling  routine 
becomes  extremely  complex,  requiring  a  very  large  number  of  computational  operations 
to  predict  transistor  behavior.  The  high  cost  of  theoretical  models  has  led  to  the 
development  of  an  alternative  strategy,  which  I  will  refer  to  as  "empirical"  modeling. 

Empirical  models  are  not  rooted  in  solid-state  physics;  rather,  they  are 
mathematical  expressions  which  provide  a  good  curve-fit  to  the  behavior  of  MOS 
transistors.  Whereas  theoretical  models  strive  for  accuracy  at  any  cost,  empirical 
models  strive  for  acceptable  accuracy  at  very  low  cost.  Since  individual  transistors  on 
the  same  physical  wafer  are  subject  to  parameter  variations,  modeling  accuracy  beyond 
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a  certain  level  is  illusory. 

Instead  of  separately  accounting  for  each  individual  physical  effect,  empirical 
models  often  lump  several  effects  together,  and  they  occasionally  ignore  some  effects 
entirely.  Accuracy  is  thus  traded  against  model  evaluation  speed,  accepting 
engineering  approximations  in  return  for  fast  simulation.  As  R.  C.  Foss  has  said  [19], 
"A  ±30 percent  representation  of  a  10  percent  effect  gives  3  percent  accuracy." 

Since  their  equations  are  not  constrained  to  be  physically  motivated,  empirical 
models  can  employ  any  convenient  mathematical  expressions  which  give  reasonable 
accuracy.  Simple  polynomials  are  often  used,  because  they  can  provide  acceptable 
curve-fits  while  permitting  fast  evaluation.  The  coefficients  of  these  polynomials  are  the 
"fudge  factois"  which  are  adjusted  to  obtain  a  good  fit.  Because  the  "fudge  factors" 
are  not  physical  constants  of  the  fabrication  process,  their  values  are  unpredictable, 
and  must  be  extracted  from  measurements  made  on  actual  transistors.  New,  as-yet- 
unfabricated  technologies  cannot  be  simulated  with  empirical  models,  because  no  test 
transistors  exist  for  parameter  extraction. 

A  hybrid  modeling  scheme  which  combines  the  flexibility  of  theoretical  models  with 
the  fast  simulation  speed  of  empirical  models  is  presented  in  this  thesis.  Model 
parameters  are  (easily  obtainable)  physical  fabrication  constants,  yet  high  speed 
empirical  equations  are  implemented  in  the  circuit  simulator.  This  is  achieved  by 
"compiling"  the  physical  constants  for  a  theoretical  model  into  the  parameters 
necessary  to  drive  an  empirical  model  which  is  used  by  the  circuit  simulator. 

The  model  parameter  compilation  process  is  analogous  to  a  high-level  computer 
language  compiler.  A  slow  translation  program  is  invoked  once,  generating  a  fast 
object  program  which  is  subsequently  invoked  many  times.  In  the  MOS  modeling 
domain,  the  "object  program"  is  a  set  of  parameters  which  will  drive  a  fast  empirical 
model  (permitting  fast  circuit  simulation).  The  "translation  program"  uses  the  input 
theoretical  model  parameters  (the  "source  program")  to  simulate  several  test 
transistors,  mimicking  the  measurements  one  would  make  to  perform  a  manual 
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parameter  extraction.  These  measurements  are  fed  into  a  multiple- parameter  nonlinear 
curve  fitting  program,  which  generates  optimal  (best  fit)  parameters  to  drive  the 
empirical  model. 

In  some  cases,  it  might  be  possible  to  analytically  compute  the  empirical  model 
parameters,  given  the  theoretical  model  parameters  ( e.g when  the  empirical  model  is  a 
subset  of  the  theoretical  model).  However,  a  more  general  solution  is  desirable,  which 
would  permit  compiling  between  any  two  arbitrarily  selected  models.  Parameters  could 
then  be  compiled  for  several  different  empirical  models,  to  evaluate  which  one  gives 
the  best  tradeoff  between  speed  and  accuracy.  To  allow  these  sorts  of  experiments,  a 
curve-fitting  program  was  constructed. 

Of  course,  transistor  modeling  is  only  one  part  of  the  work  performed  by  circuit 
simulators,  so  it  is  possible  that  a  drastic  speedup  in  transistor  modeling  might  result  in 
only  slightly  improved  total  simulation  speed.  Figure  1.1  shows  the  total  time  required 
I  to  simulate  the  same  circuit  using  two  different  MOS  transistor  models.  One  model 

(Berkeley)  is  theoretical,  and  the  other  model  (Mosaid)  is  empirical.  Data  for  this  figure 
is  taken  from  the  SPICE  2G.5  simulator,  running  on  the  VAX1  11/780  computer  under 
I  the  V7  UNIX  operating  system.  For  this  particular  circuit,  simulation  time  was  reduced 

by  35%,  simply  by  switching  to  an  empirical  model. 

The  present  modeling  effort  is  concentrated  on  the  steady-state  (DC)  current- 
voltage  characteristics  of  MOS  transistors.  Although  capacitances  and  their  model 
parameters  have  been  omitted  for  simplicity,  there  is  no  fundamental  limitation  which 
prevents  the  compilation  technique  from  extracting  them.  Indeed,  since  most 
capacitance  parameters  do  not  interact  with  DC  parameters,  two  separate  compilations 
could  be  employed,  thus  reducing  the  dimensionality  of  the  parameter-space  and 

*VAX  is  a  registered  trademark  of  Digital  Equipment  Corporation. 

^UNIX  is  a  registered  trademark  of  Bell  Telephone  Laboratories. 
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Figure  1.1  Circuit  simulation  time  for  different  MOS  models 

improving  compilation  speed. 

The  present  effort  emphasizes  the  compilation  aspects  of  transistor  modeling;  old, 
well-known  models  were  used  in  the  compilation  experiments,  rather  than  developing 
entirely  new  models.  However,  the  technique  presented  here  is  a  general  one,  allowing 
parameters  for  virtually  any  MOS  model  to  be  compiled  into  parameters  for  virtually  any 
other  model.  (For  reasons  of  time,  this  work  concentrated  on  the  Berkeley  and  Mosaid 
models).  The  same  method  could  be  applied,  for  example,  to  generate  table  entries  for 
a  totally  table-driven  empirical  model  [41],  or  to  compute  optimal  coefficients  for  a 
cubic-spline  interpolation  scheme. 

The  use  of  numerical  minimization  for  parameter  extraction  has  applications 
besides  the  proposed  compilation  strategy.  If  an  MOS  technology  has  been  in 
existence  for  some  time,  actual  test  transistors  are  available  for  measurement,  and  this 
data  can  be  fed  to  the  minimization  routine.  Optimal-fit  parameters  can  then  be 
extracted  directly  from  the  physical  data  [7.  39].  However,  this  technique  does  not 


permit  the  designer  to  easily  perturb  the  simulation  (e.g.,  to  see  the  effects  of  over- 
etched  polysilicon),  because  the  empirical  model  parameters  are  not  directly  related  to 
physical  quantities. 

This  thesis  is  divided  into  five  chapters.  Chapter  1  discusses  the  idea  of  using  a 
two-level  modeling  technique,  and  proposes  the  technique  of  constructing  a  parameter 
compiler.  MOS  models  are  the  topic  in  chapter  2.  The  distinctions  between  theoretical 
and  empirical  models  are  discussed,  and  the  two  example  models  used  in  the 
parameter  compilation  experiments  are  described.  Chapter  3  presents  numerical 
algorithms  for  nonlinear  minimization;  they  are  subsequently  used  in  the  parameter 
compilation  programs.  Chapter  4  discusses  the  process  of  model  parameter  extraction. 
Manual  extraction  techniques  are  presented,  and  then  fully  automated  procedures  are 
described.  Results  obtained  from  prototype  versions  of  model  parameter  compilers  are 
presented,  and  the  computational  benefits  of  the  compilation  strategy  are  outlined. 
Chapter  5  presents  some  concluding  remarks,  along  with  suggestions  for  extending  and 
improving  this  work. 


Chapter  2: 


DC  Models  for  MOS  Transistors 


2.1:  Introduction 

This  chapter  explores  the  problem  of  constructing  a  circuit  model  of  the  MOS 
field-effect  transistor.  The  model  is  nothing  more  than  a  set  of  equations  which 
predicts  the  device’s  current-voltage  behavior.  A  familiar  example  of  a  model  is  Ohm’s 
Law,  which  states  that  the  current  through  a  resistor  is  linearly  proportional  to  the 
voltage  across  its  terminals: 


(2.1.1) 

A  simple,  intuitive  model  of  an  idealized  transistor  is  presented,  to  show  the 
general  form  of  model  equations  and  to  exhibit  typical  l-V  characteristics.  Model 
refinements  are  then  introduced,  which  attempt  to  account  for  the  non- ideal  behavior 
observed  in  real  transistors.  As  refinements  are  added,  the  complexity  of  model 
equations  is  increased,  and  the  distinction  between  theoretical  and  empirical  models 
becomes  clearer.  The  equations  for  a  typical  theoretical  model  and  for  a  typical 
empirical  model  are  presented;  they  are  subsequently  used  in  the  paiameter 
compilation  experiments  described  in  chapter  4. 

2.2:  The  MOS  Field-Effect  Transistor 

A  cutaway  view  of  an  n-channel  MOS  transistor  is  shown  in  Figure  2.1.  The 
transistor  has  four  terminals,  called  the  Drain,  Source,  Gate,  and  Bulk  (or  substrate). 
The  drain  and  source  terminals  are  n-type  diffused  regions,  while  the  bulk  is  p-type 
silicon;  these  pn  junction  diodes  are  reverse  biased  under  normal  operating  conditions. 


Figure  2.1  MOS  Field  Effect  Transistor 


The  gate  terminal  is  typically  made  of  polycrystalline  silicon,  and  is  separated  from 
the  rest  of  the  device  structure  by  a  thin  insulating  layer  of  silicon  dioxide1.  No  DC 
current  can  flow  into  the  gate  terminal  because  of  this  insulator,  so  (unlike  the  bipolar 
junction  transistor)  the  MOS  transistor  is  a  voltage-controlled  device,  as  shown  in  the 
circuit  model  of  Figure  2.2.  Although  the  drain  and  source  terminals  are  symmetric,  a 
labeling  convention  is  adopted  such  that  VDS  >  0. 

Current- voltage  characteristics  of  MOS  transistors  can  be  (roughly)  divided  into 
three  separate  regions  of  operation,  labeled  in  Figure  2.3  as  the  Triode,  Saturation,  and 
Cutoff  regions.  Transistors  operating  in  the  cutoff  region  have  zero  drain  current  l^\ 

current  begins  to  flow  only  when  the  gate-to-source  voltage  exceeds  a  threshold  value 
called  Vj .  N-channel  devices  with  vy  >  0  volts  are  often  called  "enhancement" 

transistors,  while  transistors  with  thresholds  below  0  volts  are  called  "depletion" 

^Gate  dielectric  materials  other  than  silicon  dioxide  are  occasionally  used  in  exotic 
devices. 


< 
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Ids  =  f(Vgs,  Vds,  Vbs) 


.Drain 


Figure  2.2  Circuit  model  of  MOS  transistor 


Figure  2.3  Typical  current  -  voltage  characteristics 


devices. 

In  the  triode  region,  drain  current  is  an  increasing  function  of  drain  voltage  VDS' 

However,  as  drain  voltage  is  increased,  the  current  eventually  levels  off  ("saturates") 
and  the  transistor  enters  the  saturation  region.  The  transition  from  triode  to  saturation 


Figure  2.3  shows  the  locus  of 


occurs  at  a  value  of  drain  voltage  known  as  VDsat- 
VDsat  va,ues' 

The  n-channel  transistor  depicted  in  Figure  2.1  is  built  on  a  p-type  substrate;  an 
n-type  substrate  (with  p-type  source  and  drain)  can  also  be  used,  resulting  in  a  p- 
channel  transistor.  P-channel  devices  operate  exactly  the  same  way  that  n-channel 
transistors  do.  except  thai  .he  algebraic  signs  of  all  voltages  and  currents  are  reversed. 
Thus  a  p  channel,  enhancement  transistor  has  Vj-  <0  volts,  while  a  p-channel 

depletion  device  has  VT  >  0. 

2.3:  A  Simple  First-Order  Model 

A  very  simple  model  of  MOS  transistor  operation  is  developed  in  this  section.  The 
discussion  will  focus  on  n-channel  devices,  although  p-channel  transistors  are  readily 
modeled  by  a  change  of  algebraic  sign.  Results  arising  from  solid  state  physics  will  be 
referenced,  but  used  without  derivation,  in  order  to  treat  the  transistor  as  a  circuit-level 
lumped  element.  Many  solid-state  electrodynamic  analyses  of  the  MOS  transistor  are 
available  in  the  literature  [13,  21,  32,  47];  the  interested  reader  is  referred  to  them  for 
more  detail. 

When  a  positive  gate-to-bulk  bias  VGB  is  applied  to  the  MOS  transistor,  the 

induced  electric  field  across  the  gate  oxide  attracts  negatively-charged  carriers 
(electrons)  to  the  silicon  dioxide  surface,  as  shown  in  Figure  2.4,  These  electrons 
ionize  the  holes  present  at  the  surface,  and  begin  to  deplete  the  p-type  bulk  of 
majority  carriers.  If  the  bias  voltage  is  made  sufficiently  large,  the  number  of  electrons 
at  the  surface  exceeds  the  number  of  holes,  and  the  surface  is  said  to  be  inverted.  So 
many  electrons  are  available  that  the  surface  is  effectively  n-type,  not  p-type.  The 
inversion  layer  at  the  surface  forms  an  n-type  "channel"  from  the  (n-type)  drain  to  the 
(n-type)  source,  allowing  current  to  flow. 


Figure  2.4  Field  induced  channel 


If  the  source  is  shorted  to  the  bulk  (VGS  =  VGB  and  the  drain-to-source 

o 

is  kept  very  small,  the  total  charge  QG  in  the  channel  region  is  given  by‘ 


°c  -T^^GS- 

ox 


Channel  current  is  related  to  channel  charge  through  the  transit  time  T 


tr 


°C  =  'dS  Tlr 


2 


This  discussion  is  after  Muller  and  Kamins  (33],  pp.  350-354. 


voltage 


(2.3.1) 


(2.3.2) 
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Transit  time  is  simply  the  channel  length  divided  by  the  drift  velocity 

r _ .  i2 

~PnE  PnVDS 

The  channel  current  is  therefore 

i  i  W  £ox  ...  ..  .  .. 

DS  =  t  t  (VGS  ~  Vi  *  VDS 

ox 


(2.3.3) 


(2.3.4) 


If  the  drain  to-source  voltage  is  now  increased,  the  channel  potential  is  not 
constant,  but  increases  from  source  to  drain.  This  effect  can  be  approximated  by 
considering  the  entire  channel  to  be  at  its  "average"  potential  (Vqs~  (VDS/2))-  This 

leads  to  a  new  channel  charge  Qc 

~eox 

°C  =  ~r~  <VGS  ~  VT  -  VDS/2^  W  L  (2.3.5) 


and  a  new  drain  current 

'DS  =  ~  J  ^GS  ~VT  ~  VDS/2^  VDS  (2.3.6) 

ox 

At  constant  equation  (2.3.6)  predicts  the  drain  current  to  be  a  parabolic 

function  of  drain  voltage;  see  Figure  2.5.  The  maximum  of  the  parabola  occurs  at 
VDS  ~(VGS~VT^'  Bey°nd  this  drain  voltage,  equation  (2.3.6)  predicts  unreasonable 

behavior,  since  the  incremental  conductance  (slope)  is  negative.  In  fact,  the  drain 
current  saturates  at  this  maximum  value,  becoming  independent  of  VDS  due  to  a 

physical  mechanism  known  as  channel  pinchoff  [33]. 


Figure  2.5  Idealized  current  -  voltage  characteristics 


Equation  (2.3.6)  is  valid  in  the  triode  region  (VQS  <VDsat);  the  saturation  voltage 


is  given  iy 


VDsa(  =  VGS  ~  VT 


(2.3.7) 


At  drain  voltages  above  VD  f,  the  transistor  enters  the  saturation  region,  where  the 
drain  current  is  independent  of  VDS: 


I  -  a  —  (V  -y 

'dS  -^n  T  27  {VQS  VT> 

ox 


(2.3.8) 


The  constants  in  the  preceding  equations  (eg.,  nn>  eQX,  TQx)  are  physical 


parameters  of  the  fabricated  transistors. 


An  equivalent  formulation  would  replace  the  leading  term  of  (2.3.6)  with  a  single 
constant  "K",  yielding  the  familiar  Shichman- Hodges  model  [40]: 


0 

(a) 

if  VDsat  <  0 

'ds  =  K 

T  t VGS  ~  VT  ~  VDS/2)  VDS 

(b) 

lf  VDS  -  VDsat 

(2.3.9) 

K 

2 

—  IV  -  V  )2 
l_  {VGS  VT> 

(c) 

,f  VDS  >  VDsat 

Equation  (2.3.9)  models  all  three  regions  of  operation:  (a)  cutoff,  (b)  triode,  and  (c) 
saturation. 

2.4:  Sources  of  Error 

In  the  first-order  model  of  the  preceding  section,  the  source  was  assumed  to  be 
shorted  to  the  bulk,  and  Vj  was  independent  of  bias  voltage.  Measurements  on  actual 

transistors  show  that  Vj  varies  with  the  source-to-bulk  voltage  see  Figure  2.6. 

This  phenomenon,  known  as  the  "body  effect",  is  incorporated  into  the  Shichman- 
Hodges  model  by  introducing  the  empirical  parameters  V^q,  <p,  and  y: 

VT  =  VT0  +  y  ( +VSB)U2  -  (<P>1/2  1  (2-4.1) 

A  more  serious  source  of  error  in  the  Shichman- Hodges  model  arises  from  the 
assumption  that  the  channel  voltage  can  be  approximated  by  its  average  value 
(VQS  _  (vds/2))  This  approximation  can  be  removed  by  writing  an  expression  for  the 

voltage  drop  across  an  infinitesimal  length  dy  of  the  channel,  at  a  distance  y  from  the 


Figure  2.6  The  body  effect 


source  terminal.  Manipulation  yields 


lDSdy  -  -pnW  Oc(Vc)  dVc 


(2.4.2) 


which  can  be  integrated  to  give  the  drain  current  / ^  [33]: 


i_  1  Cox  (  VGS  "  VFB  ~  VDS/2  >  VDS  ~ 


(2.4.3) 


f  (  2esqNg  )1/2  [  (  2<pf  +VQS  +VSB  -  (2«pF  ^g)372  J; 


Equation  2.4.3  is  valid  in  the  triode  region;  current  is  assumed  to  be  constant  in 
the  saturation  region.  The  transition  voltage  is  given  by 
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VDsat  =  )JG^vFB 


e*qNa 

Vrn-^—l-a  1  + 


2  C 


ox 


ox 
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<vGS-',Fat''s8>l1/2-’)  <2A4) 


Notice  that  equation  2.4.3  explicitly  contains  terms  for  the  "body  effect"  mentioned 
above,  with  coefficients  that  are  physical  constants  of  the  fabrication  process.  It  is  a 
refinement  of  the  simple  theoretical  model  developed  in  section  2.3. 

By  an  appeal  to  channel  pinchoff  it  was  assumed  that  the  incremental  output 
conductance  (0  in  the  saturation  region  is  zero.  Actual  transistors  exhibit 

finite  output  conductance,  which  increases  with  increasing  gate  voltage.  This  effect, 
often  called  "channel  length  modulation",  is  shown  in  Figure  2.7.  The  empirical 
Shichman-Hodges  model  accounts  for  channel  length  modulation  by  including  an 
output  conductance  parameter  V£l  similar  to  the  Early  voltage  of  bipolar  transistors: 


' DS 


*  W 

2  L  (  GS 


-  V'r)2  (  1 


VDS  ~  VDsat  , 

+ - w - 1 


(2.4.5) 


Theoretical  models  have  also  been  developed  to  account  for  finite  output  conductance 
[20];  one  such  model  will  be  shown  in  section  2.6. 


2.5:  Second  Order  Effects 


As  MOS  fabrication  technology  improves,  it  becomes  possible  to  build  smaller 
transistors.  When  a  transistor's  dimensions  are  decreased,  assumptions  made  in  the 
derivation  of  the  model  equations  (e  g.,  that  threshold  voltage  is  independent  of  device 
size)  become  less  valid.  This  section  discusses  several  examples  of  such  second-order 
effects. 

Transistors  with  short  channels  (L  <  9/im)  or  narrow  channels  (IV  <  9/*m)  have 
different  threshold  voltages  than  wide,  long  devices  [1,  2,  8.  10,  48];  see  Figure  2.8. 


Figure  2.7  Channel  length  modulation 


Short-channel  thresholds  are  lower  because  the  depletion  regions  around  the  source 
and  drain  partially  ionize  the  channel.  As  channel  length  is  decreased,  this  additional 
ionization  becomes  an  appreciable  fraction  of  the  total  channel  charge.  Narrow 
channel  transistors,  on  the  other  hand,  have  higher  thresholds  than  wide  devices  [4]; 
this  results  from  field  implants  diffusing  into  the  channel  region  and  raising  the  average 
bulk  doping  density  [45]. 

Another  effect  that  becomes  more  pronounced  in  short  channel  devices  is  the 
nonlinear  velocity-field  relationship.  For  low  electric  fields,  equation  (2.3.3)  is 
applicable,  and  carrier  velocity  is  proportional  to  the  transverse  electric  field 
(E  ~  ~  vqs  /l).  However,  at  high  fields,  velocity  becomes  limited  by  carrier 

interactions  with  lattice  phonons  and  eventually  reaches  a  maximum  value.  This  effect 
limits  the  drain  current  (and  hence  the  gain)  of  shoit  channel  transistors,  as  shown  in 
Figure  2.9.  Two  transistors  with  identical  (W/L)  ratios  are  plotted;  one  has  L  =  20 
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Figure  2.8  Narrow-  and  short-channel  effects 


microns  (solid  curves),  and  the  other  has  L  =  5  microns  (broken  curves). 

In  the  derivation  of  equation  (2.4.3),  it  was  assumed  that  surface  mobility  is 
constant.  Experimentally,  mobility  is  found  to  be  a  function  of  the  vertical  electric  field 
across  the  gate  oxide.  As  gate  voltage  VQS  rises,  mobility  decreases,  so  the  final 

current  of  an  actual  transistor  will  be  less  than  that  predicted  by  (2.4.3).  Two  different 
models  for  this  effect  will  be  presented  in  the  following  sections. 

2.6;  Berkeley  Model 


The  example  theoretical  model,  used  for  the  parameter  compilation  experiments  in 
chapter  4,  is  presented  in  this  section.  It  was  developed  at  the  University  of  California 
at  Berkeley,  and  is  installed  as  the  built-in  model  in  the  SPICE  circuit  simulator  [44]. 
The  symbol  CQx  is  used  to  represent  the  gate  oxide  capacitance  per  unit  area 

eox/7"ox 
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Figure  2.9  Velocity  saturation 


Threshold  Voltage: 

V7-  55  VBIN  +  yS  *VSB^/2 

VBIN  ~  +2lpF  +(>?-  V(2<pp 

Ox 

(Body  effect  coefficient): 

Tc  =  {2qt^.NSUB)V2  (1  -  ac- 


(Depletion  region  width): 


IVS  =  XD  (  *Vsb  )’/2 

(2.6.6) 

tVD  .  XD  (  2»f  *Vsb  <Vds  )1/2 

(2.6.7) 

x  -  (  2's  )1/2 

D  '  q  NSUB  NEFF  ' 

(2.6.8) 

<  IT  £S 

+4CoxW 

(2.6.9) 

Narrow  channel  effects  are  accounted  for  by  equation  (2.6.9),  in  which  rj  increases 
as  channel  widths  decrease.  This  serves  to  raise  the  built-in  voltage  (equation  2.6.2), 
increasing  the  threshold. 

The  input  parameter  XJ  controls  the  modeling  of  short  channel  effects.  Depletion 
regions  around  the  source  and  drain  serve  to  ionize  carriers  in  the  bulk  and  lower  the 
threshold  of  short  channel  transistors.  This  is  modeled  using  Dang’s  trapezoidal 
approach  [10],  where  the  width  of  the  depletion  regions  at  the  source  and  drain  are 
given  by  equations  (2.6.6)  and  (2.6.7)  respectively. 

Surface  Mobility. 


E  --"(V 
x  f  '  1 


GS 


~VT) 


,  UCRIT  .UEXP 
Mc(-r- ) 
x 

**eff  = 

^0 


(a)  if  Ex  >  UCRIT 

( b )  if  Ex  <  UCRIT 


(2.6.10) 


(2.6.11) 
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Mobility  remains  constant,  at  its  low-field  value  of  jig,  until  the  vertical  electric  field 
E  exceeds  UCRIT.  Above  UCRIT ,  mobility  decreases  exponentially  according  to 
equation  (2.6.11a). 

Drain  Current: 

|  \  *™n<VDS'W  <2-6  ,2> 


I'n  T-  I  Cox  <  vas  -  VBIN  -  1 V2  >  v*  - 
eff 

(2.6.13) 

f  rs  f  ( 2«y  *vsb  )3/2  -  '  2ff  *vsb  P 2  » 


This  formula  is  essentially  identical  to  the  theoretical  prediction  given  in  equation 
(2.4.3),  except  for  the  inclusion  of  short  channel  effects  (via  ys)  and  narrow  channel 

effects  (via  rj).  When  VQS  exceeds  ^Dsar  Vx  becomes  constant  and  the  drain  current 
saturates. 

Saturation  Voltage: 


VMAX  = 


Dsat 


W 


Cox  Lelf  (  VGS  “  VBIN  ~  11 VDsal~yS  t  VDsat  +2(pF  +VSB > 


(2.6.14) 


Xn*VMAX  XnVM  AX 

L-‘‘  =  +-J4 - XD[  JL. -  +  {vns  -  Vn„,)]1/2  (2.6.15) 


-eff  ~  P 
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Lp  =  L  -  2 Ld  (2.6.16) 

Expressions  (2.6.14)  and  (2.6.15)  are  nonlinear  equations  in  the  two  unknowns  Leff 

and  VDsat  [44],  Velocity  saturation  (Figure  2.9)  is  accounted  for  by  the  parameter 

VMAX.  Finite  output  conductance  in  the  saturation  region  is  introduced  through  the 
channel  length  modulation  term 

The  input  parameters,  most  of  which  are  physical  constants  of  the  fabrication 
process,  are  summarized  in  Table  2.10.  (Only  "NEFF"  is  an  empirical  adjustment).  The 
actual  Berkeley  model  implemented  in  SPICE  is  slightly  more  complicated,  since  it 
includes  prethreshold  conduction.  However,  this  feature  was  not  used  in  the 
compilation  experiments,  so  the  modeling  equations  are  omitted  here. 

The  model  program  is  relatively  expensive  to  evaluate;  including  the  calculations  of 
the  small  signal  conductances,  it  requires  257  double-precision  floating  multiply 
operations,  125  divisions,  20  square  roots,  6  exponentiation  operations,  and  4  logarithm 
evaluations. 

2.7;  Mosaid  Model 

The  Mosaid  model  was  developed  to  provide  reasonable  modeling  accuracy  while 
permitting  high  speed  simulation.  It  is  based  primarily  on  the  Shichman-Hodges 
equations  (2.3.9),  which  are  very  fast  to  evaluate.  Empirical  parameters  are  introduced 
to  account  for  short  channel  effects,  while  avoiding  nested  iterations  (such  as 
equations  2.6.14  and  2.6.15). 

Effective  Gate  Length:  Leff  =  L  -2 Lq  (2.7. 1) 
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Parameter 

Units 

Description 

Meters-2 

Density  of  surface  states 

2  <PF 

Volts 

Surface  potential 

To* 

Meters 

Gate  oxide  thickness 

NSUB 

Meters-^ 

Substrate  doping  density 

XJ 

Meters 

Junction  depth 

ld 

Meters 

Lateral  diffusion  &  poly  over-etch 

NEFF 

— 

Total  channel  charge  coefficient 

^0 

Meters2/Volt-Second 

Low-field  surface  mobility 

UCRIT 

Volts/Meter 

Critical  field  for  mobility  degradation 

UEXP 

— 

Exponent  of  mobility  reduction 

VMAX 

Meters/Second 

Scattering  limited  carrier  velocity 

Table  2.10  Berkeley  model  parameters 
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Threshold  Voltage: 

y 

VT  =  VT0  +y  l  +VS8)1/2  -  W1/21  -  Kd  (2.7.2) 

Leff 

The  Shichman  Hodges  model  (Equation  2.4.1)  is  used  to  account  for  the  body 
effect,  with  the  addition  of  short  channel  threshold  reduction  through  the  parameter 


* 


/<£,.  Narrow  channel  threshold  effects  (Figure  2.8)  are  not  modeled. 


Effective  Gate  Drive:  Vfl  =  VqS  -  Vj  (2.7.3) 

KP 

Effective  Gain  Factor:  Keff  =  (2.7.4) 

In  this  formulation,  effective  mobility  is  decreased  even  at  low  vertical  electric 
fields,  unlike  the  two-region  model  implemented  in  the  Berkeley  model  (equation 
2.6.11).  This  equation  has  been  found  to  yield  acceptable  accuracy  while  permitting 
fast  evaluation  [9,  10,  47]. 

Drain  Current: 


Kef,W 


DS~  Leff  +2KeffRSVDS  '"e 


<2^  “  W  VDS 


KeffW 


Lef<-CMlVDS-VDsat>  °Sat 


(a)  ifVe  <0 

<*>  ifVDS  <  VDsat  <2-7-5> 

(c)  itVDS  >  VDsaf 


Saturation  Voltage: 


2V_ 


Dsat 


i1/2 


(2.7.6) 


1  +(1  *(Wel,RsVe/Lef,)V 
Velocity  saturation  effects  (Figure  2.9)  are  modeled  by  postulating  a  feedback  term 


R  with  the  dimensions  of  resistance,  such  that  the  gate  drive  Vge  is  reduced  as  drain 


current  increases: 
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=  V, 


GS 


'  VT  ~'dSR 


To  make  the  R  term  geometry-independent,  it  is  replaced  by  ftg/W  in  the  Mosaid 

model.  The  reduced  gate  drive  is  substituted  into  the  Shichman-Hodges  model 
(equation  2.3.9b),  giving  the  drain  current  (equation  2.7.5b).  [KP  in  the  Mosaid  model 
is  equal  to  K/2  in  the  Shichman-Hodges  model]. 

The  saturation  voltage  V Dsgt  is  found  by  setting  d/D^/dVD^  equal  to  zero,  which 

gives  equation  (2.7.6) 


Channel  Length  Modulation: 


a  = 


Dsat 


(2.7.8) 


CM  =  ^ol1  "e 


~alVDS  ~  VDsat) 


(2.7.9) 


The  original  Mosaid  model  used  a  constant  value  of  C^q  in  2.7.5(c):  this  gives  rise  to 


a  slope  discontinuity  at  VDsat-  In  the  linear  region  the  slope  tends 

toward  zero  at  VDsat .  while  in  the  saturation  region  the  slope  remains  finite  at  vDsat  • 

Since  the  SPICE  circuit  simulator  uses  first  derivatives  in  its  Newton-Rhapson 
computations,  a  discontinuous  slope  can  cause  simulation  nonconvergence,  so  the  fix 
described  by  equations  (2.7.8)  and  (2.7.9)  was  inserted.  Parameters  for  the  Mosaid 
model  are  summarized  in  Table  2.11. 

The  Mosaid  model  is  comparatively  inexpensive  to  evaluate;  it  requires  only  36 
double  precision  floating  multiply  operations.  12  divisions,  6  square  roots,  and  2 
exponentiation  calculations. 


Parameter 

Units 

Description 

lD 

Meters 

Lateral  diffusion  &  poly  over  etch 

VT0 

Volts 

Zero-bias  threshold 

r 

Volts172 

Body  effect  coefficient 

<p 

Volts 

Surface  potential 

kd 

Meters 

Short  channel  threshold  coefficient 

KP 

Amps/Volts2 

Gain  constant 

e 

1  /Volts 

Mobility  degradation  coefficient 

Rs 

Ohm  -  Meters 

Velocity  saturation  feedback  constant 

CM0 

Meters  /  Volt 
[ _ 

Channel  length  modulation  constant 

Table  2.11  Mosaid  model  parameters 


Chapter  3:  Minimization  Algorithms 


After  a  mathematical  model  of  transistor  behavior  is  selected,  parameter  values 
must  be  determined  which  give  the  best  fit  to  a  particular  set  of  measurements. 
Serious  modeling  errors  can  occur  if  the  parameter  values  are  poorly  chosen.  To 
insure  that  the  best  possible  modeling  accuracy  is  maintained  across  the  entire  range 
of  transistor  operation,  it  is  desirable  to  create  a  set  of  "optimally  good"  parameter 
values. 

Let  P  be  an  n  -vector  such  that  P^  is  the  value  of  the  k- th  model  parameter. 

Suppose  there  exists  a  function  /:  Rn-*R1  such  that  f(P)  is  a  measure  of  the 
modeling  error  incurred  when  the  parameters  P  are  used.  Then  the  optimum  parameter 
values  exist  at  the  point  P  where  f(P)  is  smallest. 

The  problem  of  finding  the  minimum  value  of  a  function  has  been  extensively 
studied  [12,  16,  18,  30].  This  chapter  presents  five  of  the  better-known  minimization 
algorithms.  Together  with  a  suitable  function  f(P),  they  can  be  used  to  construct 
programs  for  compiling  optimal  sets  of  model  parameters.  Detailed  derivations  of  the 
individual  algorithms  are  not  given,  nor  are  convergence  properties  proven,  since  the 
emphasis  is  on  their  application  and  not  their  design.  In  each  case,  the  original 
papers  are  referenced  for  the  reader  who  wishes  to  study  the  algorithms  in  greater 
detail. 

It  should  be  noted  that  these  algorithms  are  designed  to  find  local  minima;  no 
computationally  tractable  algorithm  is  known  for  finding  the  global  minimum  of  an 
arbitrary  function. 
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3.1:  Notation 


To  simplify  the  discussion  of  the  algorithms,  a  common  notation  has  been  used 

throughout  the  chapter.  The  function  f(x):Rn-*R^  is  to  be  minimized,  and  the 
n -vector  x-  is  the  starting  point  for  the  jth  iteration.  A  displacement  Ax-  is  made  from 


the  present  point  x,  to  the  next  point  x-  i: 

J  J  +  1 


v i  ■ x;  *4x; 


(3.1.1) 


The  displacement  can  be  written  in  the  form 


4x/  •  -a,Pj 


(3.1.2) 


where  p- ,  an  n  -vector,  denotes  the  search  direction  and  aj ,  a  scalar,  is  the  stepsize. 
In  general,  the  symbol 


g  =  V  f  (x ) 


(3.1.3) 


will  be  used  for  the  gradient  of  f,  where  g  is  an  n -vector.  Also, 


G  =  V2f(x) 


(3.1.4) 


is  the  nX  n  matrix  of  second  partial  derivatives.  For  scalar  functions,  the  symbol 
f(*>(x)  will  be  used  to  indicate  the  A  — th  derivative  of  f,  evaluated  at  x. 


3.2:  Hooke  and  Jeeves  Algorithm 


This  algorithm  is  based  on  the  extremely  simple  idea  of  varying  one  coordinate  at 
a  time.  From  an  initial  point  Xqq.  a  step  of  length  8  is  taken  along  coordinate  1.  f  is 

evaluated  at  (Xqq),  (x^  +  5  e  1),  and  (*oo“  ^ei)-  where  e1  is  a  unit  vector  along 
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coordinate  1.  Whichever  ot  these  three  points  gives  the  lowest  value  of  f  is  then 
named  xni,  and  the  process  is  repeated  for  all  n  unit  vectors,  finally  producing  xn 


which  completes  iteration  0. 

When  a  displacement  is  made  that  reduces  the  function  value,  further  steps  along 
the  same  direction  will  probably  continue  to  decrease  I.  The  Hooke  and  Jeeves 
algorithm  [26]  attempts  to  exploit  this  property,  using  a  successful  direction  of  descent 
as  a  first  try  for  further  exploration.  For  this  reason,  the  algorithm  is  also  called 
"pattern  search"  in  the  literature.  Instead  of  starting  iteration  1  at  the  endpoint  of 
iteration  0,  the  displacement  vector  is  doubled  in  length: 


=  Xfin  +  2(x  -  Xqq)  -  2xnn  -  x 


On  *00 


(3.2.1) 


The  algorithm's  progress  on  minimizing  a  function  of  two  variables  is  shown  in  Figure 


As  the  algorithm  proceeds  toward  a  minimum,  the  stepsize  S  eventually  becomes 
too  large  and  must  be  reduced.  It  is  replaced  when  an  iteration  fails  to  reduce  the 
function  value:  f(xjn)  >  H*j g)-  A  simple  geometric  formula  is  used: 


S  =  p  5  (p  <  1) 


(3.2.2) 


The  algorithm  terminates  when  the  value  of  5  is  reduced  below  some  minimum  stepsize 
tHJ.  The  implementation  of  this  algorithm  described  in  chapter  4  uses 


EHJ  *  '°' 

*0  * 
p  =  0.4 


(3.2.3) 
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Figure  3.1  Hooke  and  Jeeves  algorithm 


3.3:  Simplex  Algorithm 

This  algorithm  is  based  on  topological  generalization  of  a  triangle1.  Just  as  a 
triangle  is  a  two-dimensional  object  having  three  vertices,  a  simplex  is  an 
n  -dimensional  object  having  n  +  1  vertices.  A  tetrahedron,  for  example,  is  a  simplex  in 
3-dimensional  space. 

From  a  starting  simplex  of  n  +  1  points,  the  point  at  which  f  is  highest  is  reflected 
through  the  hyperplane  formed  by  the  other  n  points.  If  the  function  value  at  the  new 
point  is  lower,  the  simplex  has  been  improved  and  the  procedure  can  be  repeated. 
During  the  minimization  procedure  the  simplex  changes  in  size  and  shape,  adapting  to 

^This  algorithm  for  function  minimization  should  not  be  confused  with  Dantzig’s  "sim¬ 
plex  algorithm"  for  linear  programming. 


¥ 

I 
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the  local  contour  of  f.  The  algorithm  is  given  below  using  the  notation  of  Fidler  and 
Nightingale  [14]: 

Xy  is  the  vertex  which  gives  the  highest  value  of  f(x) 

Xg  is  the  vertex  which  gives  the  second  highest  value  of  f  (x ) 

xL  is  the  vertex  which  gives  the  lowest  value  of  f(x) 

x  0  is  the  centroid  of  all  vertices  xk  other  than  x H : 

*0  =  Z**  (*  *W)  (33.1) 

Three  different  operations  may  be  performed  on  the  simplex. 

(1)  The  vertex  xH  may  be  reflected  through  the  centroid  to  give  xR: 

xR  =  (1  +  a)xQ  -  a  xH  (3.3.2) 

(2)  The  simplex  may  be  expanded  along  a  favorable  direction: 

*£  =  y*R  +  (1  -  Y)  *o  (3.3.3) 

(3)  The  simplex  may  be  contracted: 

*c  ~  PXR  +  0  _  P)*o  (3.3.4) 

These  operations  are  used  to  find  the  minimum  value  of  f  in  Algorithm  3.2  [14, 
35], 

The  algorithm  is  terminated  when  the  standard  error  falls  below  a  given  threshold  e^: 

SlM*.-)-  /_J2  5 

- (3.3.5) 

n  ° 

Since  the  vertices  of  the  initial  simplex  may  be  widely  separated,  the  algorithm 
essentially  has  n  +  1  independent  starting  points.  The  impact  of  an  especially  poor 
starting  point  is  therefore  reduced. 
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[1]  The  distinguished  vertices  x^,  xs,  xL<  and  xQ  are  computed. 

[2]  xff  is  computed  according  to  (3.3.2). 

[3]  If  f(xp)<f  (xH ),  but  f{Xfl)>f  [xL ),  an  improvement  has 
been  made,  so  xH  is  replaced  by  xR,  and  another  iteration  can 
begin  at  step  [1]. 

[4]  If  f  (xR )  <  1  (xL ),  the  reflected  point  has  a  lower  function 

value  than  all  other  vertices.  This  direction  appears  to  be  favor¬ 
able,  so  the  expansion  (3.3.3)  is  attempted,  x ^  is  replaced  by 

whichever  of  x ^  or  x^  gives  the  lower  function  value,  followed 

by  a  return  to  step  [1]. 

[5]  If  f(xR)<  t(xH),  but  f(xR)>  f  (x ),  only  a  minor  improvement 

has  been  made,  and  a  contraction  is  performed  using  equation 
(3.3.4)  in  case  the  reflection  has  overshot  a  better  point.  xH  is 

replaced  by  whichever  of  xfl  or  x^  gives  the  lower  function 

value,  and  the  algorithm  returns  to  step  [1]. 

[6]  If  f(xR)>  f  (xH ),  no  improvement  has  been  made  by  reflec¬ 
tion.  The  implication  is  that  the  minimum  probably  lies  within  the 
simplex,  so  the  simplex  is  shrunk  about  the  lowest  point  : 

*k  =  (xk  +*L)/2 


Algorithm  3.2  The  simplex  method 


Chapter  4  describes  an  implementation  of  the  simplex  algorithm  which  uses 


•s  ■  1(r" 

a  =  1.0 

P  =  0.5 

y  =  2.0 


(3.3.6) 
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3.4:  Newton's  Iteration 


The  Hooke  and  Jeeves  and  simplex  algorithms  operate  on  function  values  only;  no 
derivative  information  is  required.  If  the  first  and  second  derivatives  of  f  are  available, 
more  powerful  algorithms  may  be  applied,  giving  faster  and  more  reliable  convergence. 
These  algorithms  use  a  first-order  Taylor  series  expansion  of  I  about  In  the  one- 

variable  case, 


f(x2)  m  / (x  1)  +  [x2-  x^  f(1)(x-,) 


(3.4.1) 


A  point  x2  is  not  a  minimum  unless 


- 0 


(3.4.2) 


Maximum  points  and  inflection  points  also  satisfy  (3.4.2);  local  explorations  about  x2 

are  usually  performed  to  verify  that  it  is  indeed  a  minimum. 

An  approximate  solution  to  (3.4.2)  can  be  found  by  differentiating  (3.4.1)  and 
setting  the  result  equal  to  zero. 


°  =  +  l*2~  x/  7^1> 
u  dx^ 


(3.4.3) 


Equation  (3.4.3)  has  the  solution 


x2  =  x  1  ~ 


(x,) 


1  f(2)  (*-,) 


(3.4.4) 


which  is  known  as  Newton’s  iteration. 

In  the  multivariable  case  f  :Rn  ->R1,  the  first-order  Taylor  approximation  is 

/(*2)  S  f(xj  +gT[*2-  x.,]  (3.4.5) 

where  g  is  the  gradient  of  f  at  x  1 .  The  derivative  of  equation  (3.4.5)  is  set  equal  to 
zero,  giving  a  minimum  at 

x2  =  x1  -  G'1  g  (3.4.6) 

where  G  is  the  matrix  of  second  partial  derivatives  at  x^  In  the  terminology  of  (3.1.2), 
the  search  direction  p1  is  given  by  (G_1  g),  and  the  stepsize  is  equal  to  one. 

Equation  (3.4.6)  is  used  to  calculate  the  displacement  Ax^  =  -G~^9j,  which 

provides  the  starting  point  for  iteration  (/  +1).  Iterations  are  performed  in  this  fashion 
until  convergence  is  achieved.  The  algorithm  is  considered  to  have  converged  when 
the  gradient  is  approximately  zero  (see  equation  3.4.2): 

9T9  <  eN,  (34.7) 

Newton’s  iteration  has  the  desirable  property  of  quadratic  convergence:  if  f  is  a 
quadratic  function,  the  minimum  point  is  achieved  in  a  number  of  iterations  at  most 
equal  to  the  number  of  variables  [27],  Since  an  arbitrary  function  can  be  closely 
approximated  by  a  quadratic  over  a  small  interval,  the  algorithm  will  converge  equally 
rapidly  on  arbitrary  functions,  providing  the  starting  point  is  close  to  the  minimum.  This 
property  assures  rapid  convergence  in  the  final  stage  of  computation. 

Newton’s  iteration  (3.4.6)  has  limited  usefulness  as  a  practical  minimization 
algorithm,  because  it  requires  the  user  to  supply  formulae  for  calculating  the  n  first 

derivatives  in  the  vector  g,  and  the  (n2)  second  derivatives  in  the  matrix  G.  Frequently 
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this  is  inconvenient  (or  impossible),  as  for  example  when  the  number  of  coordinates  n 
is  large.  Variations  of  the  basic  Newton  iteration  have  been  developed  which  do  not 
require  second  derivatives;  they  are  much  more  useful  than  the  original  formula  (3.4.6). 

These  quasi-Newton  algorithms  use  approximations  of  G  or  G-^  to  eliminate  the  need 
for  second  derivatives. 

3.5:  Quasi-Newton  Algorithms 

Huang  [27]  has  presented  a  class  of  quadratically  convergent  algorithms  which 
replace  Newton's  iteration  (3.4.6)  by  the  approximation 

Pj  =  Hj  9j  ■  tej  =  ~aj  Pj  (3.5.1) 

where  the  n  X  n  matrix  H  characterizes  a  particular  algorithm.  (In  all  cases,  H  is 
computed  without  using  second  derivatives).  The  initial  matrix  HQ  is  taken  to  be  the 

identity  matrix.  Several  well-known  and  highly  successful  minimization  algorithms  are 
members  of  Huang's  class,  including  the  conjugate-gradient  algorithms  and  the  variable 
metric  algorithms. 

Once  the  search  direction  p .  is  known,  the  stepsize  <Xj  must  be  chosen.  For  good 
progress  toward  the  minimum  of  f,  the  value  of  ctj  should  be  selected  which  minimizes 
the  scalar  function  u  (a ): 


u  (a  )  =  f(Xj  -  apj)  (3.5.2) 

Very  efficient  algorithms  are  known  for  univariate  minimization,  so  the  optimum  a  can 
quickly  be  found.  This  determines  the  displacement  Axy  and  hence  the  new  point 

xj  +y  ar|d  another  iteration  can  begin.  The  algorithm  terminates  when  the  gradient  is 


sufficiently  small,  similar  to  the  Newton  termination  criterion  (3.4.7). 
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3.6:  Conjugate-Gradient  Algorithm 

A  particularly  appealing  member  of  Huang's  class  is  the  conjugate-gradient 
algorithm,  studied  by  Polak  and  Ribiere  [37].  It  uses  the  approximation 


"j  "  '  + 


gj-Sgj-  1 


where  I  is  the  identity  matrix.  Substitution  into  (3.5.1)  yields 


(3.6.1) 


Pj  -  ffy  +  yPj.  1 


where  the  scalar  y  is  given  by 


r  = 


to,- 1  -  o/sj 


(3.6.2) 


(3.6.3) 


This  substitution  shows  that  the  matrix  Hj  need  not  be  stored  directly;  only  the  vectors 

Pj-V  Pj1  9j-  v  anc*  Pj  are  required.  The  Polak-Ribiere  approach  is  therefore 

attractive  for  very  large  problems,  where  the  storage  cos.  of  an  n  X  n  H-matrix  would 
be  prohibitive. 

Equation  (36.2)  gives  the  search  direction  Pj,  and  the  steplength  a-  is  computed 

by  univariate  minimization.  Iterations  proceed  in  this  fashion  until  the  gradient  vector 
approaches  zero: 


g  <  e 


CG 


(=  10-1°) 


(3.6.4) 


4 


3.7:  Fletchers  Switching-Policy  Algorithm 


This  algorithm  combines  the  well  known  Davidon-Fletcher-Powell  [17]  and 
Broyden-Fletcher-Goldfarb-Shanno  [23]  variable  metric  procedures,  giving  a  hybrid 
algorithm  which  is  superior  to  each  [15]. 

The  DFP  algorithm  is  a  member  of  Huang's  class,  characterized  by 


Vi 


v  iVy  i 


(3.7.1) 


where 


*  ■  -r  Vi 

v-VVi  ,3'7'2' 


The  BFGS  algorithm  is  also  a  member  of  Huang’s  class: 

1  w,_  iT^r  yTn.  lY  SfiT 

Hj  =  H  1 - -4  -  +  [  1  + - (3.7.3) 

Fletcher  s  switching  policy  algorithm  uses  either  (3.7.1)  or  (3.7.3),  depending  on  the 
outcome  of  a  test:  if 


sTy  Z  yTHj_:y 


(3.7.4) 


then  the  BFGS  update  formula  (3.7.3)  is  used  for  iteration  j.  Otherwise,  the  DFP 
update  (3.7.1)  is  used.  Fletcher  has  presented  performance  data  indicating  that  this 
hybrid  algorithm  is  faster  than  either  of  its  component  parts.  The  hybrid  approach,  he 


claims,  avoids  near-singularity  in  the  H  matrix,  making  the  algorithm  more  stable  [15]. 

If  the  H  matrix  is  not  positive  definite,  then  stability  and  convergence  to  a  minimum 
are  much  less  likely.  Fletcher’s  algorithm  contains  a  safety  test  which  prevents  H  from 
becoming  nonpositive  definite.  Whenever 

STy  <  (  =  10"16)  (3.7.5) 

the  algorithm  is  "restarted"  by  setting  Hj  to  the  identity  matrix.  The  algorithm  is 
terminated  when  the  gradient  becomes  sufficiently  small: 

gTg  <  tp  (=ur10)  (3.6.4) 

3.8:  Line  Searches 

After  the  search  direction  p  is  determined,  the  line  (x  -  op)  is  searched  to  find 
the  optimum  value  of  a .  In  this  section,  two  methods  are  discussed  for  performing  a 
univariate  search.  Both  algorithms  proceed  by  finding  an  interval  (a,  b)  which  is 
known  to  contain  the  minimum  point  x.  The  size  |b  -  a  |  of  this  "bracket"  is  then 
decreased  repeatedly  until  a  minimum  is  found. 

A  simple  criterion  may  be  applied  to  find  an  initial  bracket:  at  least  one  local 
minimum  exists  in  the  interval  (a,  b)  if  there  is  a  point  c  within  the  interval,  such  that 

(a<c<b)  and  (f(c)<f(a)]  and  [f(c)</(b)l  (3.8.1) 

A  bracket  satisfying  (3  8.1)  may  be  easily  found  using  the  algorithm  of  Davies,  Swann, 
and  Campey  [6];  the  procedure  is  illustrated  in  Figure  3.3.  From  the  initial  point  a  =  0, 
a  step  of  size  6  is  taken  in  the  +  a  direction.  Stepsizes  are  repeatedly  doubled  until 


I 
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Figure  3.3  Establishing  an  initial  bracket 


the  minimum  is  overshot  and  condition  (3.8.1)  is  met,  establishing  the  first  bracket. 

Once  the  initial  bracket  is  found,  a  simple  algorithm  for  converging  to  a  minimum 
is  the  Golden  Section  search,  illustrated  in  Figure  3.4.  New  points  c  and  d  are 
selected,  which  divide  the  original  interval  into  three  pieces:  (a,  c),  (c.  d),  and  (d,  b). 
The  function  1  is  evaluated  at  c  and  d,  and  the  reduced  interval  is  given  by 


(a  ’,  b*)  =  (a,  d)  if  f(c)  <f(d) 

(a  *,  b')  =  (c,  b)  if  1(c)  >  f(d) 


(3.8.2) 


The  size  of  the  bracket  has  been  reduced  by  the  fraction  ad/ab  (or  cb/ab).  To  keep 
the  algorithm  unbiased,  these  fractions  should  be  identical.  Choosing  c  and  d  such 
that  cb  =  ad  insures  that  the  bracket  will  shrink  by  the  same  amount  on  either  side. 

Two  function  evaluations  were  used  in  the  simplified  iteration  described  above  (at 
c  and  d).  The  Golden  Section  algorithm  actually  uses  only  one  evaluation  per 
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Figure  3.4  Golden  Section  search 


iteration,  through  careful  selection  of  the  points  c  and  d.  If  the  new  bracket  is,  for 
example,  (a,  d),  then  the  old  point  c  lies  within  the  bracket,  and  f  (c )  has  already  been 
evaluated.  The  point  c  should  therefore  be  reused,  so  that  only  one  new  point  needs 
to  be  evaluated  in  the  next  iteration. 

If  the  ratio  ad/ab  is  equal  to  the  ratio  ac/ad,  then  the  point  c  can  be  reused  and 
the  resulting  partition  of  the  bracket  will  be  unbiased,  as  shown  in  Figure  3.4. 
Normalizing  to  ab  =  1,  let  y  =  ad  so  that  (1  -  y)  =  ac.  Then 


(3.8.3) 


This  equation  produces  the  Golden  Section  ratio  used  in  Greek  architecture  (hence  the 
algorithm's  name).  The  solution  is 
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(5)1/2  -  1 


0.61803 


(3.8.4) 


Two  function  evaluations  are  performed  for  the  first  bracket.  Each  subsequent  bracket 
uses  equation  (3.8.4)  to  compute  one  new  point,  so  only  one  function  evaluation  is 

required  per  iteration.  After  n  iterations,  the  bracket  has  shrunk  to  (0.61803)'7  of  its 
original  size.  The  search  is  completed  when  the  bracket  has  been  reduced  to  a 
sufficiently  small  size 


\b  -  a  |  <  t0  (=  10~15)  ,  or 


|b— -a  |-  <  (=  10-3) 
b  ~~  ° 


(3.8.5) 


Table  3.5  compares  the  Golden  Section  algorithm  to  a  simple  binary  search.  On 
each  iteration,  binary  search  halves  the  bracket,  but  two  function  evaluations  are  used. 
The  table  shows  the  number  of  function  evaluations  required  to  shrink  an  initial  bracket 
of  size  |b  -  a  |  =1  down  to  various  final  sizes;  binary  search  takes  approximately  1.4 
times  as  many  iterations  as  the  Golden  Section  search. 


Line  Search 


Final  bracket  size 


Algorithm 

IE-1 

IE-2 

IE-3 

iE-4 

IE-5 

IE-6 

Binary  search 

8 

14 

20 

28 

34 

40 

Golden  Section 

5 

10 

15 

20 

24 

29 

Table  3.5  Function  evaluations  required  for  bracket  reduction 


Another  popular  line  search  algorithm  uses  a  quadratic  interpolation  scheme  for 
finding  the  minimum  [22].  Given  an  initial  bracket  (a,  b)  and  a  point  c  within  the 
bracket,  a  quadratic  is  fitted  to  these  three  points.  The  minimum  of  the  quadratic  is 
given  by  [38]: 

w  =  1  [b2-  c2]((a)  +  [c2-  a2]f(b )  +  [a2  -  b2]f(c)  (386) 

2  [b  -  c]f  (a)  +  ]c  -  a]f  (b)  +  [a  -  b]f  (c) 

A  new  bracket  is  then  formed,  taking  the  three  lowest  points  found  so  far  (which  still 
maintain  a  bracket).  Another  interpolation  is  computed,  and  the  process  is  repeated. 
Near  the  minimum,  a  quadratic  interpolation  will  be  very  close  to  any  arbitrary  function, 
and  the  final  stages  of  convergence  will  be  very  fast. 

After  evaluating  both  line  search  algorithms,  the  Golden  Section  method  was 
selected  for  use  in  the  minimization  implementations  described  in  chapter  4  When  an 
initial  bracket  contained  several  minima  (a  common  occurrence),  Golden  Section 
appeared  to  converge  more  rapidly  than  quadratic  fitting:  presumably  this  is  because  a 
quadratic  is  a  poor  approximation  to  a  function  with  more  than  one  minimum. 

3.9:  Numerical  Differentiation 


In  many  cases,  analytic  derivatives  of  f  (x )  are  not  available,  so  they  must  be 
computed  numerically.  For  example,  the  derivative  dl^^/dVl^AX  for  the  Berkeley 


model  does  not  exist  in  closed  form  (see  equation  2.6.14).  A  finite  difference  formula 
can  be  used  to  estimate  the  derivatives: 


f(x  +  6 ey )  -  f(x) 
8 


(3.9.1) 


If  f(x)  is  known,  this  forward-difference  formula  requires  only  one  additional  function 
evaluation  to  compute  the  derivative.  A  central-difference  approach  would  need  to 
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evaluate  f  at  both  (x  +Se^)  and  (x  -  Se^);  this  is  too  expensive  for  use  in  a 

minimization  algorithm.  The  size  S  of  the  finite  interval  is  chosen  to  reflect  the  scale 
of  the  component  x^: 

6  =  max  (h  x. ,  S *)  (3.9.2) 


where 


(3.9,3) 


3.70.  Marquardl's  Algorithm 

Suppose  a  physical  process  is  modeled  by  the  equation 

y  =  M(V,x)  (3.10.1) 

where  y  is  the  dependent  variable,  V  =  (l^,  V ^  ....  Vy)  are  the  independent  variables, 

and  x  s  (x  ....  x  )  are  adjustable  parameters  of  the  mathematical  model  /W.  A  set 

of  m  experiments  is  made,  obtaining  values  of  y  for  different  values  of  V.  The  problem 
is  to  find  those  values  of  x  which  give  the  experimental  data  the  best  fit  to  (3.10.1). 

The  best  fit  in  the  least-squares  sense  is  obtained  by  defining  the  m  residuals: 

fl,(x)  =  M(V.,  x)  -  y(  (3.10.2) 

and  minimizing  f ,  the  sum  of  squares  of  these  residuals  [28]: 


fix)  =  Z*,(x)2  =  R(*)TR(x) 


(3.10.3) 
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Special  purpose  algorithms  for  minimizing  sum-ofsquares  functions  such  as  (3.10.3) 
have  been  developed;  the  most  popular  of  these  methods  is  due  to  Marquardt  [31]. 
Newton's  iteration  can  be  applied  directly  to  (3.10.3),  giving 

G(x)Ax  =  -  g(x)  (3.10.4) 

The  gradient  g(x)  can  be  expressed  in  terms  of  the  vector  ft  of  residuals: 

g(x)  =  2J(x)TR(x)  (3.10.5) 

where  J  (x )  is  the  m  X  n  Jacobian  matrix  with 

9R(x) 

=  - -  (3.10.6) 

U  ox- 

The  matrix  G  of  second  derivatives  is  given  by 

G  (x )  =  2J(x)TJ(x)  +  2j:V2fi(x)fl(.(x)  (3.10.7) 

where  V^ft^x)  is  the  second  derivative  matrix  of  ft.(x)  [28]. 

Equation  (3.10.7)  shows  that  a  substantial  part  of  the  matrix  G  is  obtained  by 
using  only  first  derivatives.  Near  the  solution,  if  the  residuals  are  small  or  almost  linear, 
then  the  second  term  of  (3.10.7)  is  negligible.  This  leads  to  the  approximation 

G(x)  §  2J  (x  )TJ  (x )  (3.10.8) 

which  can  be  substituted  into  Newton’s  iteration  (3.10.4),  giving 

J(x)rJ(x)  Ax  =  -  J(x)rft(x)  (3.10.9) 
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Equation  (3.10.9)  is  known  as  the  Gauss-Newton  algorithm. 

In  practice  it  has  been  found  that  (3.10.8)  is  sometimes  a  poor  approximation, 
leading  to  divergence  of  the  iterates.  Marquardt  suggests  that  a  "damping  term”  (A/) 
should  be  added  to  the  Gauss-Newton  algorithm  [31]: 


(JTJ  +\l  )bx  =  -  JtR 


(3.10.10) 


where  /  is  the  identity  matrix  and  A  >  0  is  a  variable  parameter  of  the  algorithm. 

On  any  particular  iteration,  J  and  R  are  fixed,  so  that  Ax  can  be  considered  a 
function  of  A.  When  A  =  0,  Marquardt's  algorithm  reverts  to  the  Gauss-Newton  formula 

(3.10.9).  As  A-»oo,  then  Ax-*  -JTR/ A,  which  is  an  incremental  step  along  the 
direction  of  steepest  descent  of  t.  Note  that  divergence  is  impossible  with  suitably 
large  values  of  A. 

At  iteration  /,  Marquardt's  algorithm  solves  (3.10.10)  for  Ax,  using  increasingly 
larger  values  of  A  until  a  displacement  Ax  is  obtained  for  which  f(x.  +Ax)<  f(Xy). 

Typically  A  is  increased  in  powers  of  ten  until  a  function  decrease  is  obtained: 


A  =  10  A 


(3.10.11) 


The  algorithm  is  halted  when  the  displacement  Ax  is  negligibly  small 


T  +  |X(.| 


-  eM 


(3.10.12) 


or  when  the  value  of  A  exceeds  a  preset  threshold 


A  >  A 


(3.10.13) 


Chapter  4: 


Automatic  Parameter  Extraction 


Parameters  for  empirical  MOS  models  are  usually  generated  by  a  tedious  set  of 
hand  calculations,  followed  by  an  iterative  procedure  of  "tweaking"  the  numbers  until 
an  acceptable  curve  fit  is  obtained.  Although  it  has  been  used  successfully  for  many 
years,  this  method  suffers  from  at  least  three  major  drawbacks:  (1)  it  is  time  consuming, 
(2)  it  is  prone  to  errors  of  omission  or  oversight,  and  (3)  its  termination  criterion  is 
qualitative  rather  than  quantitative.  This  chapter  discusses  the  computer 
implementation  of  numerical  algorithms  for  parameter  extraction,  designed  to  eliminate 
these  drawbacks.  Prototype  versions  of  parameter  compilers  (using  these  numerical 
algorithms)  are  presented.  The  performance  of  these  compilers,  measured  in  terms  of 
modeling  accuracy  and  compilation  speed,  is  detailed. 

Since  transistors  have  several  different  regions  of  operation  (Cjtoff,  Triode, 
Saturation)  and  geometry-dependent  behavior  (short  and  narrow  channel  effects),  many 
measurements  are  needed  to  adequately  characterize  device  operation  in  all  regimes. 
These  regions  are  arbitrarily  constructed  for  ease  of  explanation  and  understanding; 
transistor  behavior  is  actually  a  continuum,  with  certain  effects  more  pronounced  in 
some  areas  than  in  others.  This  leads  to  difficulties  in  parameter  extraction,  because 
(erroneous)  assumptions  are  often  made  that  device  behavior  is  localized  within  these 
regions.  For  maximum  accuracy,  the  entire  operating  range  must  be  considered  when 
extracting  parameters,  because  they  interact  in  the  final  model.  Unfortunately,  manual 
parameter  extraction  methods  assume  that  (some)  parameters  are  independent,  and 
that  their  effects  are  localized;  this  necessitates  "tweaking"  the  final  numbers  to 
obtain  acceptable  accuracy. 


4.1:  Manual  Methods 


To  illustrate  the  methods  used  in  manual  parameter  extraction,  a  manual  extraction 
will  be  performed  for  the  Mosaid  model  of  section  2.7.  The  parameters  of  long,  wide 
channel  transistors  will  be  extracted  first  [44];  the  model  fit  for  these  large  devices 
should  be  very  good,  because  they  will  be  closely  approximated  by  the  first-order 
theory.  Then  small  geometry  effects  will  be  accounted  for,  taking  care  to  isolate  the 
effect  of  each  parameter,  so  its  value  can  be  calculated  independently  from  the  others. 
The  first  parameter  to  be  measured  will  be  the  threshold  voltage  Vj  of  a  long, 

wide  channel  device.  Equation  (2.7.5b)  shows  that  ,f  VDS  is  very  small,  drain  current 
is  proportional  to  vqs  ~  vr  ~  VDS/2'  Measuremen,s  drain  current  are  taken  at 
various  values  of  gate  voltage  current  begins  to  How  at  =  Vj  +  VDS/2. 

This  is  shown  in  Figure  4.1.  Drain  current  is  not  precisely  linear  with  gate  voltage, 
because  the  surface  mobility  begins  to  fall  as  VGS  rises.  VT  is  found  by 

extrapolating  to  lDS  =  0  from  the  maximum-slope  portion  of  the  curve,  to  insure  that 

degraded  mobility  is  not  inadvertently  included  in  the  threshold  voltage. 

Figure  2.6  shows  that  threshold  voltage  is  a  function  of  source  to  bulk  voltage 
VSB'  bY  repeating  the  measurement  of  Figure  4.1  at  several  values  of  VSB'  the  body- 

effect  parameters  <p  and  y  can  be  computed.  The  parameter  Vjq  is  simply  the 

measured  value  of  Vj  taken  at  =  0.  Rearranging  equation  2.4.1, 
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(Vds  =  SOmV) 


Vt  +  Vds/2 


Figure  4.1  Measurement  of  Vj. 

Initially  taking  q>  =  0.6  (a  good  approximation  for  silicon  devices)1,  a  change  of 


variables  can  be  made: 


=  VT  ~  VT  0 


=  -y  («p)' 


Equation  (4.1.1)  is  transformed  into  the  familiar  linear  equation 


(4.1.2) 


mx  +  b 


(4.1.3) 


'  9  represents  the  Fermi  potential  of  the  silicon  surface,  2<pF.  It  is  proportional  to  the 

logarithm  of  the  bulk  doping  density,  and  its  value  is  usually  close  to  0.6  in  typical 
silicon  MOS  transistors. 


Using  the  trial  value  of  <p,  the  measured  data  can  be  plugged  into  equations 
(4.1.2)  and  (4.1.3),  extracting  values  for  m  and  b  with  a  simple  linear  regression.  This 
computation  is  easy  to  perform,  and  is  implemented  on  several  hand-held  scientific 
calculators  [24], 

The  new  value  of  <p  is  inserted  into  equation  (4.1.2)  in  place  of  the  initial  guess, 
and  another  linear  regression  is  performed.  Regressions  are  iterated  in  this  fashion 
until  the  values  of  <p  and  y  stabilize  (say,  to  within  0.1%). 

After  extracting  the  model  parameters  for  threshold,  the  gain  parameters  KP  and  6 
are  computed.  Drain  current  measurements  are  taken  on  long,  wide  devices  in  the 
saturation  region  (where  equation  (2.7.5c)  applies).  Solving  for  KP, 

KP  *»<vcs-VI  w  '—2 

[VGS  ~  V 

which  suggests  the  substitution 


L_ 

W 


'DS 


GS 


Vt)‘ 


_  L_  ~(VGS  VT)IDS 

W  _  y  )2 

'VGS  VT> 

m  =  6 
b  =  KP 


(4.1.5) 


The  measurements  are  plugged  into  (4.1.5),  and  a  linear  regression  is  performed, 
giving  6  and  KP. 

The  channelshortening  parameter  Lp  can  be  found  by  measuring 
transconductance  ^iqs/^^gs'^  as  a  *unct'on  °f  mask  channel  length  t.  Two 


transistors  with  identical  channel  widths  and  different  channel  lengths  are  operated  at 
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low  drain  voltage,  and  measurement  of  are  taken  at  several  values  of  V^g. 

Transconductance  is  the  maximum  value  of  the  slope  of  this  curve;  see  Figure  4.1. 
Under  these  conditions, 


Slope 


Const 
L  +  AL 


(4.1.6) 


where  AL  =  (-2 LQ ). 

Measured  values  Slope,  and  Slope2  are  taken  on  devices  with  mask  channel 
lengths  L,  and  L  2  The  two  resulting  equations  (4.1.6)  are  divided,  giving  the  ratio 


Slope.,  L  2  +  AL 

Slopeg  L  ,  +  AL 


(4.1.7) 


Solving  for  AL, 


AL  = 


L  2  Slope2  -  L  ,  Slope, 
Slope,  -  Slope2 


(4.1.8) 


At  this  point,  model  parameters  VjQ,  y,  <p,  KP,  0,  and  Lp  are  known  for  long,  wide 
transistors.  The  small-size  parameters  are  extracted  next,  beginning  with  Kp. 
Rewriting  equation  (2.7.2), 


Kn  = 


-eff 


0  V. 


f  V70-VT  nKT’+^g)1/2-^)1^]  ) 


*1/2i 


DS 


(4.1.9) 


At  a  constant  VDS'  threshold  measurements  are  made  for  several  values  of 
channel  length  Lelf  Each  data  point  is  plugged  into  equation  (4.1.9),  giving  a  value 
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of  Kq  .  These  are  averaged  to  produce  the  final  Kq- 


jo  extract  a  value  of  R§,  drain  current  measurements  are  taken  in  the  triode 


region.  The  current  equation  (2.7.5c)  is  solved  for  ,  giving 


Rc  =  - 

S  2  K 


elf  VDS 


[Kef,WV PS^e'W  _  L 


OS 


eft 


(4.1.10) 


Several  lD $  measurements  are  taken  on  short  channel  transistors  in  the  triode 
region,  and  values  of  Rs  are  then  computed  from  (4.1.10).  The  final  value  of  R$  is 
taken  as  the  average  of  the  results  from  (4.1.10).  Leff'  Keff'  and  Ve  values  used  in 

these  calculations  should  be  computed  from  equations  (2.7.1),  (2.7.4),  and  (2.7.2) 
respectively. 

CMq  can  similarly  be  extracted  from  lDS  measurements  made  on  short-channel 
devices  in  the  saturation  regiot..  Equation  (2.7.5c)  is  manipulated  to  yield 


'MO  \j  _  y 

VDsat  VDS 


,  Kell  w  'W2  ,  , 

t  ;  Leff  j 


DS 


(4.1.11) 


VDsat  is  comPuted  ,rom  equation  2.7.6  and  known  values  of  the  other  parameters. 
The  final  value  of  CMq  is  taken  to  be  the  average  of  the  individual  values  computed  by 


equation  (4.1.11). 

Several  simplifying  assumptions  were  made  in  the  parameter  extraction  recipe 
given  above,  resulting  in  increased  modeling  errors.  For  example,  equation  (4.1.6) 
ignores  the  flg  term  in  the  denominator  of  equation  (2.7.5b)  by  assuming  that  drain 


current  is  inversely  proportional  to  (L  +  AL).  More  significantly,  the  values  of  V^-g,  y, 
<p,  and  Kd  are  derived  from  indirect  measurements:  first  a  quantity  called  " Vj "  is 


computed,  and  then  parameter  values  are  fitted  from  threshold  data. 


Actual  MOS  transistors  do  not  abruptly  turn  off  at  VGS  =  VT'  as  shown  in  Figure 

4.2.  Below  threshold,  drain  current  is  an  exponential  function  of  gate  voltage  [42,  43). 
The  Berkeley  model  includes  an  optional  term  to  account  for  this  "prethreshold 
conduction";  it  was  omitted  from  the  discussion  in  chapter  2  because  the  option  was 
turned  off  in  the  compilation  experiments.  (The  Mosaid  model  does  not  account  for 
this  current).  The  gradual  transition  from  "off"  to  "on"  makes  the  task  of  selecting  a 
unique  Vj  difficult,  so  parameter  values  extracted  from  such  measurements  will  have 

large  uncertainties. 


4.2:  Numerical  Techniques 

The  manual  parameter  extraction  technique  presented  in  the  last  section  can  be 
thought  of  as  a  sequence  of  minimizations:  in  each  case,  values  for  parameters  are 
chosen  so  as  to  minimize  modeling  error.  This  is  accomplished  by  using  linear 
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regressions  or  by  simple  averaging  Unfortunately,  the  errors  introduced  in  each  of  the 
individual  extractions  can  accumulate,  leading  to  a  higher  total  modeling  error  (over  the 
entire  range  of  device  operation). 

The  single  pass  nature  of  the  manual  extraction  technique  does  not  permit  residual 
model  errors  to  be  spread  among  several  parameters,  which  could  lead  to  an  overall 
error  reduction.  Since  each  parameter  is  extracted  in  a  small  region  of  the  total  device 
operating  range,  with  this  scheme  it  is  impossible  to  distribute  errors  over  many  such 
regions.  In  this  section,  numerical  minimization  algorithms  (presented  in  chapter  3)  will 
be  used  to  extract  all  parameters  simultaneously. 

The  objective  function  to  be  minimized  will  be  a  measure  of  the  modeling  error,  for 
example  the  normalized  error  at  each  data  point 


'd Sm^  ~  'DSp(P)0) 


(4.2.1) 


At  the  yth  data  point,  the  modeling  error  e  ■  is  the  normalized  difference  between  the 
measured  current  lDSm(i)  and  the  model’s  predicted  current  lD5p  (P)(j).  (The  notation 
,DSp(P)U)  stands  for  the  predicted  value  of  lQS  at  the  ;th  data  point,  using  the  vector 
P  of  model  parameters).  Measurement  /  is  taken  on  a  transistor  of  size  (Wj/Lj),  at 
(VGSU).  vDSU),  VSBU) ). 

If  the  model  current  is  a  continuous,  differentiable  function  of  the  parameters,  then 
so  is  the  modeling  error  e^  at  each  point.  Quasi  Newton  minimization  algorithms, 

which  require  require  the  objective  function  to  be  differentiable,  may  therefore  be  used 
with  equation  (4.2.1). 

A  simple  measure  of  the  total  modeling  error  with  parameter- vector  P  is  the  sum  of 
the  squares  of  the  n  individual  errors 
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f(P)=e12+e22+-  +ep2  (4.2.2) 

which  is  a  differentiable  function  of  P. 

The  distribution  of  measurements  about  the  total  operating  range  of  the  transistor 
will  affect  the  accuracy  of  the  final  model.  For  example,  if  the  vast  majority  of 
measurements  are  made  in  the  linear  region,  model  accuracy  in  the  saturation  region 
will  be  impaired.  Regions  of  operation  are,  therefore,  implicitly  weighted  in  the  error 
formulation,  according  to  their  frequency  of  appearance  in  the  measurement  data.  If 
the  measurements  are  clustered  in  a  small  region,  the  error  will  be  lowest  in  that 
vicinity.  Large  errors  at  a  few  outlying  points  will  be  tolerated,  because  their 
contribution  to  the  final  total  is  small. 

Explicit  weighting  is  also  possible  [46],  giving  a  total  error  function 


f  (P)  =  w  f  +  w2e22  +  +wnen2  (4.2.3) 

Weights  might  be  explicitly  applied  if  it  is  desired  to  emphasize  modeling  accuracy 
in  certain  crucial  regions  of  operation.  Although  implicit  weighting  could  be  used 
(simply  by  including  many  data  points  in  those  regions),  explicit  weights  are  more 
suitable,  because  they  require  fewer  total  data  points  (and  fewer  evaluations  of  t). 

If  the  measurements  are  made  on  physical  instruments,  some  accuracy  will  be  lost 
at  very  low  currents.  It  is  desirable  to  prevent  the  extraction  algorithm  from  generating 
faulty  parameter  values  due  to  noise  in  these  measurements.  A  modified  error  function 
can  be  used  to  set  a  "current  floor”;  below  this  floor  value  /mjn,  measurements  are 

given  less  weight  in  the  overall  error  function  [46]. 


,  -  wp)(j) 

1  '  max  f  W'>'  'mini 


(4.2.4) 


A  floor  value  is  also  useful  if  the  model  to  be  extracted  does  not  account  for 


prethreshold  conduction. 

Several  of  the  algorithms  presented  in  chapter  3  are  sensitive  to  the  scale  of  the 
variables;  if  the  elements  of  the  parameter-vector  differ  by  orders  of  magnitude,  severe 
loss  of  accuracy  will  occur  [31].  For  this  reason  the  parameter  vector  is  approximately 
normalized;  its  elements  are  divided  by  typical  values,  insuring  that  all  parameters  are 
near  unity.  The  parameters  are  of  course  de- normalized  for  the  evaluation  of  the 
actual  modeling  routine. 

The  organization  of  a  model  parameter  extraction  program  is  diagrammed  in  Figure 
4.3.  First,  n  transistor  measurements  are  input;  they  include  the  measurement 
conditions  (W,  L,  VGS<  VDS,  Vsg)  and  the  resulting  drain  current  lgs .  A  numerical 

minimization  algorithm  is  then  used  to  perturb  the  parameter  vector  P  until  the  error 
function  l(P)  of  equation  (4.2.2)  is  minimized.  For  each  measurement,  the  parameters 
P  and  the  measurement  conditions  are  input  to  the  empirical  MOS  modeling  routine, 
giving  a  predicted  drain  current  lDSp-  Equation  (4.2.4)  is  used  to  find  the  error  for 

that  measurement,  which  is  summed  with  the  others  to  form  /(P). 

This  program  organization  is  rather  independent  of  the  specific  model  equations; 
new  models  may  be  extracted  merely  by  substituting  a  new  empirical  modeling  routine. 

4.3:  Compilation 

The  program  of  Figure  4.3  can  be  fed  input  measurements  which  are  in  fact  the 
results  of  a  simulation,  a  model  compilation  is  performed  exactly  this  way.  Parameters 
for  an  empirical  model  are  then  generated  automatically,  using  input  parameters  for  a 
theoretical  model  to  drive  the  simulation. 

To  examine  the  model  accuracy  obtainable  with  this  approach,  five  parameter 
extraction  programs  were  constructed  and  tested.  The  only  differences  among  the 
programs  were  the  numerical  minimization  algorithms  used.  Extractions  were  performed 
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physical 

measurements 


optimal  P 


Figure  4.3  Program  organization 

i 

with  Hooke  &  Jeeves,  Simplex,  Conjugate  Gradient,  Fletcher,  and  Marquardt 
procedures. 

Parametf'  "compilations",  as  described  in  chapter  1,  were  performed:  a  set  of 
parameters  for  the  Berkeley  model  was  supplied,  and  the  extraction  programs 
computed  a  corresponding  set  of  Mosaid  model  parameters.  The  SPICE  2G.5  program 
was  used  to  simulate  measurements  of  test  transistors  (using  the  Berkeley  model  and 
the  given  parameters),  and  these  measurements  were  fed  to  the  extraction  programs. 
Extraction  algorithms  minimized  the  error  function  (4.2.3),  attempting  to  generate 
Mosaid  model  parameters  which  would  predict  exactly  the  same  current  voltage 
behavior  as  the  Berkeley  parameters  did. 

The  simulated  "measurements"  must  be  carefully  chosen  to  represent  the  whole 
range  of  anticipated  device  operation.  If  no  data  is  supplied  for  a  particular  region  of 
operation,  the  resulting  empirical  model  is  forced  to  extrapolate  to  predict  behavior  in 
that  region,  resulting  in  higher  modeling  error.  Short,  long,  wide,  and  narrow  channel 
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devices  should  be  included  in  the  measurements,  operating  in  cutoff,  triode,  and 
saturation.  Both  low  and  high-current  measurements  should  be  taken,  and  back-bias 
effects  (Vge>0)  should  be  included.  Table  4.4  shows  the  simulated  measurements 

used  in  the  compilation  experiments.  Six  values  of  VDS'  five  values  of  VGS'  four 
values  of  W/L,  and  two  values  of  VSB  were  simulated,  for  a  total  of  240  data  points. 

1 


Variable 

Values 

(W/L) 

(100/100) 

(100/4) 

(4/100) 

(4/4) 

VSB 

0 

-3 

VGS 

0.5 

1 

1.5 

3  5 

VDS 

0.05 

1 

2 

3  5  6 

Table  4.4  Data  points  for  parameter  compilation 

The  parameters  used  for  the  Berkeley  theoretical  model  are  shown  in  Table  4.5. 
They  represent  a  typical  contemporary  n-cbannel  MOS  process,  approximately  of  the 
same  dimensions  as  "HMOS-1"  [36]. 

A  set  of  normalization  constants  were  used  to  keep  the  elements  of  the  Mosaid 
parameter  vector  approximately  at  the  same  order  of  magnitude  (near  unity);  these  are 
shown  in  Table  4.6.  The  starting  values  of  the  normalized  parameters  were  all  set  to 
1.0. 

Explicit  weighting  was  used  to  emphasize  measurements  at  small  currents;  this 
boosts  model  accuracy  near  "Vj",  and  improves  circuit  simulations  of  inverter  trip- 


< 

! 

1 


i 


1 

Table  4.6  Mosaic!  parameter  normalization  coefficients 

^  to  find  a  value  of  /mjn  that  gives  a  reasonable  balance  between  errors  at  low  and  high 

currents. 

Table  4.7  shows  the  total  modeling  error  f(P)  and  RMS  percentage  error 
i  [100  (  f  (P)/240  )1/2  ]  at  four  different  values  of  /  j  .  The  error  values  cited  in  this 

table  are  the  errors  produced  by  Fletcher’s  minimization  algorithm.  As  /  jn  increases, 

total  error  decreases,  since  the  error  contributions  of  low-current  measurements  are 
<  diminished. 

Figures  4.8  4.11  show  the  model  fit,  from  parameters  extracted  by  Fletcher's 

algorithm,  as  a  function  of  /min.  The  l-V  curves  of  a  (100/4)  transistor  are  plotted. 

I  simulated  with  the  input  theoretical  model  (solid  curve)  and  with  the  resulting  empirical 


Parameter 

Multiplier 

VT0 

1.0 

KP 

IE-5 

y 

0.1 

<p 

0.1 

e 

0.1 

ld 

IE-6 

o 

o 

IE-7 

kd 

IE-7 

Rs 

IE-3 

Table  4.7  Effect  of  /mjn  on  modeling  error 

model  (dotted  curve).  One  set  of  curves  is  given  for  low  currents  (VGg  =0,  0.5,  1.0, 

....  2.5)  and  one  set  is  plotted  for  high  currents  (VGS  =0,  1,  2 .  5).  In  both  cases 

VDS  ranges  from  0  to  6  volts. 

Comparing  Figures  4.8  and  4.9,  it  appears  that  the  fit  is  not  dramatically  altered  by 
raising  /mjn  from  2  to  20  microamps.  However,  as  /mjn  is  increased  to  60  and  200 

microamps  (Figures  4.10  and  4.11),  the  curves  begin  to  move  and  accuracy  is  improved 
at  high  currents.  Eventually  low-current  accuracy  is  sacrificed,  and  increasing  /  j 

worsens  the  problem.  Based  on  these  observations,  a  value  of  / mjn  =  60  pA  was 

selected  to  be  incorporated  into  the  final  parameter  compilation  procedure. 


Figure  4.11a  Model  Fit  at  Low  Current,  /  •  =  200  microamps. 
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Figure  4.nb  Model  Fit  at  High  Current  / 

mm 


200  microamps. 


These  parameter  extractions  were  performed  using  a  sum-of-squares  measure  of 
modeling  error.  Sum-of  squares  tends  to  optimize  the  average  case,  because  the  error 
of  each  data  point  is  summed  into  the  final  measurement.  A  different  style  of  error 
measure  can  be  defined  which  optimizes  the  worst  case;  it  is  called  "minimax": 

g(P)  =  max  |ey.  |  (4.3.2) 

Note  that  the  function  g(P)  in  the  minimax  formulation  is  not  differentiable;  quasi- 
Newton  algorithms  therefore  cannot  minimize  it  directly.  Although  techniques  exist  tor 
transforming  a  minimax  problem  into  a  sequence  of  differentiable  functions  [5],  it  is 
simpler  to  exploit  a  minimization  algorithm  which  does  not  require  differentiability. 

Far  away  from  the  minimax  solution,  it  is  likely  that  g(P)  =  1,  because  some  data 
point  will  give  !meas  >  0  while  the  model  predicts  lpred  =  0  (see  equation  4.2.4). 

Small  perturbations  of  the  parameter  vector  P  will  not  change  g(P),  and  the 
minimization  algorithm  will  terminate  because  the  local  gradient  is  zero.  The  minimax 
formulation  therefore  requires  a  starting  point  which  is  close  to  the  final  solution;  in 
particular  g(P)  must  be  less  than  1.  A  useful  heuristic  which  often  satisfies  this 
requirement  is  to  start  the  minimax  algorithm  with  the  final  parameter  vector  produced 
by  a  sum-of-squares  procedure;  this  effectively  adds  the  execution  times  of  the  two 
programs. 

A  version  of  the  Hooke  &  Jeeves  algorithm  was  coded  which  used  (4.3.2)  as  the 
error  function.  The  value  of  p  was  increased  to  0.8,  so  the  steplength  did  not  shrink 
so  quickly  between  iterations.  This  prevents  the  algorithm  from  overlooking  a 
promising  search  direction.  Modeling  results  using  this  minimax  technique  are  shown 
in  Figure  4.12.  Comparing  these  with  Figure  4.10.  the  overall  curve-fits  obtained  by 
minimax  optimization  appear  to  be  inferior  to  sum-of-squares  results.  Since  there  is  no 
penalty  (in  terms  of  g)  for  increasing  the  error  at  a  point  which  is  not  the  worst-case, 
the  minimax  algorithm  tolerates  rather  large  errors  at  all  points. 
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4.4:  Algorithm  Performance 

Five  extraction  programs  using  sum-of-squares  minimization,  and  one  program 
using  minimax,  were  monitored  for  three  values  of  floor  current:  /  ^  =  20,  60,  and  200 

(iA.  Table  4.13  shows  the  performance  of  each  algorithm  when  compiling  the  example 

O 

model  from  the  previous  section.  All  experiments  were  performed  on  a  VAX  11/780 

q 

computer  running  release  4.1  of  the  V7  UNIX0  operating  system.  Programs  were  coded 
in  the  C  language. 

The  third  column  of  Table  4.13  indicates  how  many  times  the  error  function  f(P) 
(equation  4.2.3)  was  evaluated  during  the  minimization,  while  the  fourth  column  gives 
the  total  run  time  in  user  CPU-seconds.  The  last  column  shows  the  value  of  f(P) 
returned  when  the  algorithm  terminated.  Function  evaluations  and  CPU  times  given  for 
the  Hooke  &  Jeeves  minimax  algorithm  include  the  first  sum-of-squares  phase.  Note 
also  that  g(P)  values  from  the  minimax  algorithm  are  not  directly  comparable  to  f(P) 
values  produced  by  the  other  algorithms. 

Parameter  extraction  time  is  related  to  the  complexity  of  the  model:  the  longer  it 
takes  to  evaluate  f(P),  the  longer  it  will  take  to  extract  the  model  parameters.  If  f(P) 
is  sufficiently  expensive  to  compute,  the  computational  overhead  of  the  minimization 
algorithm  becomes  negligible,  and  execution  speed  is  determined  solely  by  the 
evaluations  of  t  (P).  Therefore  the  number  of  computations  of  f(P)  should  be 
considered  along  with  the  CPU  time  when  deciding  the  relative  merits  of  extraction 
algorithms. 

The  Marquardt  algorithm  is  designed  to  handle  only  sum-of-squares  problems,  yet 
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Algorithm 

'min'  ^ 

1  (P)  evals 

CPU  time 

final  f(P) 

Fletcher 

20 

3230 

913.0 

0.7713 

Marquardt 

20 

3801 

1549.3 

0.7733 

Hooke&Jeeves 

20 

2080 

632.8 

1.3864 

Simplex 

20 

3230 

2638.0 

0.7713 

Conj.  Grad. 

20 

12289 

3459.7 

0.9567 

Minimax 

20 

3455 

1018.9 

0.15986 

Fletcher 

60 

■i 

3288 

907.1 

0.4095 

Marquardt 

60 

2528 

1044.8 

0.4095 

Hooke&Jeeves 

60 

1754 

499.1 

0.5439 

Simplex 

60 

5774 

1710.2 

O', 095 

Conj.  Grad. 

60 

12205 

3571.1 

0.4256 

Minimax 

60 

2976 

800.6 

0.13944 

Fletcher 

200 

2880 

813.3 

0.1280 

Marquardt 

200 

856 

370.0 

0.1280 

Hooke&Jeeves 

200 

1455 

436.3 

0.1614 

Simplex 

200 

5774 

3304.6 

0.1282 

Conj.  Grad. 

200 

12205 

3505.6 

0.1294 

Minimax 

200 

2624 

760.0 

0.06888 

Table  4.13  Parameter  compilation  speed  of  minimization  algorithms 


it  is  not  superior  to  Fletcher's  method  except  in  the  case  /mjn  =  200  pA.  Recent  data 

indicates  that  if  the  objective  function  is  highly  nonlinear,  and  the  final  error  is 
nonzero,  then  special  sum  of  squares  algorithms  are  no  better  than  general  minimizers 
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[34]. 

Hooke  &  Jeeves  is  the  generally  the  fastest  algorithm,  but  it  yields  function  values 
which  are  not  minimal.  If,  however,  the  model  is  extremely  expensive  to  evaluate, 
Hooke  &  Jeeves  can  be  used  to  quickly  locate  a  parameter-vector  near  the  minimum. 
Then  a  slower,  more  accurate  procedure  (such  as  Marquardt’s)  could  seek  out  the  final 
optimum. 

Fletcher's  variable  metric  algorithm  consistently  produces  the  lowest  value  of  f(P), 
while  maintaining  a  relatively  small  number  of  function  evaluations  and  a  rapid 
execution  speed.  It  is  therefore  chosen  as  the  single  minimization  procedure  to  be 
incorporated  into  the  final  parameter  compiler. 

Table  4.14  gives  the  (unnormalized)  values  of  the  Mosaid  parameters  produced  by 
Fletcher's  algorithm  for  each  value  of  /  j  . 

4.5.  Simulation  Speed 

To  asses  the  benefits  derived  from  using  an  empirical  model,  simulations  were 
performed  on  two  identical  circuits.  One  was  simulated  using  the  Berkeley  model,  and 
the  other  was  simulated  using  the  compiled  Mosaid  model  of  the  previous  section.  The 
simulations  were  identical  in  every  other  respect,  so  the  difference  in  measured 

ution  speed  was  caused  by  the  difference  in  the  models. 

i  ii»',  was  repeated  for  fourteen  circuits,  each  having  a  different 

•t  •'  "  ti  v,tnf,  to  monitor  the  effect  of  circuit  size  on  relative  simulation 

■  "  t  iguie  4  15  they  consist  of  an  iterated  parallel 

■  ...e  tii.-i  i-  oology  was  chosen  to  guarantee  that 
•  ■  e  ■  ■•>.  im-  '.t- -p  other  tc  pologies  (such 
•  ti,  it  i  (.n-.tiot  .line  for  many 
i  h.f  i  ened'il  to 
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the  number  of  simulator  evaluations  is  no  longer  proportional  to  circuit  size. 

If  all  of  the  iterated  modules  in  the  parallel  topology  were  identical,  the  waveforms 
at  the  internal  nodes  would  be  identical.  A  (very  sophisticated)  static  pre  analysis 
could  notice  this  fact,  and  bypass  the  majority  of  the  computation  by  simulating  only 
one  module.  The  circuit  of  Figure  4.15  avoids  this  problem  by  making  transistor  Ml  a 
different  size  in  each  module.  Channel  length  was  selected  by  generating  a  random 
number  in  the  range  (8 n  <  L  <  1 2/i),  and  channel  width  was  chosen  from 
(20/i  <  W  <  30/i). 

The  results  of  these  experiments  are  shown  in  Tables  4.16  and  4.17;  they  are  also 


Parameter 

/  ■  =  200 
min 

=  60 
mm 

lm:  n  =60 
mm 

'min  =  20 

min 

(SSO) 

(SSQ) 

(Minimax) 

(SSO) 

VT0 

2.065E-1 

1.956E-1 

1.834E-1 

1.838E-1 

KP 

1.249E-5 

1.148E-5 

1.205E-5 

1.059E-5 

y 

3.717E-1 

3.900E-1 

4.043E-1 

4.045E- 1 

<p 

5.485E-1 

5.51  IE-1 

7.318E-1 

5.734E-1 

6 

2.082E-1 

1.769E-1 

1.856E-1 

1.516E-1 

LD 

8.196E-7 

8.539E-7 

6.823E-7 

9  076E-7 

CM0 

8.602E-8 

9.247E-8 

6.654E-8 

9.040E-8 

kd 

3.487E-8 

- 

3.073E-8 

4.386E-8 

2.740E-8 

Rs 

1.053E-2 

1.265E-2 

6.734E  3 

1.473E-2 

Table  4.14  Compiled  values  of  Mosaid  model  parameters 
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plotted  (in  normalized  form)  as  Figure  1.1.  Measured  SPICE  execution  time  for  a  circuit 
of  n  transistors  was  fitted  to  the  equation 

Time  =  C  +  Knf 

Sum-of-squares  fitting  was  used  to  find  optimum  values  of  C,  K,  and  E;  the  worst-case 
error  at  any  single  point  was  6%.  The  computed  optimum  values  are 

Model  C  K  E 


Mosaid 

2.141 

3.298 

1.036 

Berkeley 

2.455 

5.125 

1.042 

The  "average  speedup”  achieved  by  using  the  Mosaid  model  instead  of  the  Berkeley 
model  is  K^osajd  /  K  Berkeley  =  °  6435'  I  hat  is'  SPICE  simulations  using  the  Mosaid 

model  will  take  65%  as  long  as  the  same  simulations  using  the  Berkeley  model. 


average  SPICE 

Number  of 

a 

n 

time  (seconds) 

trials 

(seconds) 

2 

8.873 

100 

0.13 

5 

20.06 

100 

0.35 

10 

37.26 

20 

0.76 

20 

76.48 

10 

0.82 

25 

92.50 

5 

1.19 

50 

- 1 

186.58 

5 

2.00 

75 

294.54 

10 

1.63 

100 

418.00 

2 

1.41 

150 

594.14 

5 

1.32 

200 

827.60 

1 

... 

300 

1200.1 

1 

... 

450 

1833.2 

1 

... 

725 

2943.4 

1 

... 

1000 

4341.5 

1  ! 

... 

Table  4.16  Simulation  speed  using  Mosaid  model 
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Chapter  5:  Conclusions  and  Directions  for  Further  Research 

The  principal  goal  of  this  effort  has  been  to  demonstrate  the  practicality  of  a  two- 
level  modeling  scheme.  An  automatic  parameter  extraction  program  was  exhibited  in 
chapter  4,  proving  the  feasibility  of  the  proposed  translation  mechanism.  Parameters 
for  the  Berkeley  theoretical  model  were  successfully  compiled  into  parameters  for  the 
Mosaid  empirical  model.  Errors  introduced  by  the  second  level  of  modeling  were  small 
(<5%),  which  suggests  that  simulation  accuracy  will  not  be  appreciably  degraded. 

Several  numerical  minimization  algorithms  were  tested,  and  Fletcher's  switching- 
policy  variable  metric  algorithm  was  found  best  for  extracting  Mosaid  model  parameters. 
The  Marquardt  algorithm  did  not  offer  substantial  improvements  in  speed  or  accuracy, 
even  though  it  is  specifically  designed  to  operate  on  parameter-extraction  problems.  A 
"current  floor"  of  60  microamperes  was  found  to  provide  a  good  tradeoff  between 
accuracy  at  low  currents  and  accuracy  at  high  currents. 

The  effect  of  model  complexity  on  total  simulation  time  was  measured,  and  it  was 
found  that  simple  empirical  models  provide  substantial  improvements  over  theoretical 
models.  Speedups  approaching  35  percent  were  observed,  indicating  that  the  two-level 
modeling  technique  will  allow  significant  increases  in  simulation  productivity.  Much 
additional  investigation  remains  to  be  done;  this  chapter  outlines  several  areas  in  which 
the  two-level  modeling  idea  can  be  extended  and  improved. 

5. 1:  Highly  Nonlinear  Models 

The  Mosaid  model  used  in  the  compilation  experiments  is  characterized  by  a  small 
set  of  equations,  in  which  each  model  parameter  is  a  coefficient  for  a  simple 
polynomial.  Predicted  currents  are  gently  varying  functions  of  parameter  values,  so 
minimization  procedures  which  operate  on  first  derivatives  will  give  fast  convergence. 
If,  however,  a  model  with  a  strong  nonlinear  relationship  between  parameter  values  and 
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predicted  currents  is  chosen,  these  algorithms  will  converge  much  more  slowly  [25]. 
This  suggests  a  combined  approach,  using  a  function-comparison  method  to  quickly 
find  a  point  near  the  optimum  (where  the  error  in  the  Newton  approximation  is  small). 
Then  a  quasi-Newton  procedure  could  be  used  for  the  final  convergence  stage. 

5.2:  Nearly  Redundant  Parameters 

Model  equations  may  be  defined  such  that  some  parameters  are  redundant  or 
nearly  redundant.  For  example,  the  equation  /  =  is  a  poor  model  of  a  resistor 

because  K2  is  redundant:  there  are  an  infinite  number  of  (K^  k2)  pairs  which  predict 

the  same  current-voltage  behavior. 

The  Berkeley  model  suffers  from  this  problem,  because  device  gain  is  set  by  the 
ratio  of  two  input  parameters: 

,  ^0 
DS  “  J 

ox 

This  apparent  redundancy  can  send  some  minimization  algorithms  into  slightly-damped 
oscillation,  repeatedly  varying  *1q  and  Tqx  until  a  stable  solution  is  stumbled  upon.  The 

two  parameters  are  not  completely  redundant,  but  the  differences  between  them  are 
only  manifested  in  second  order  effects  (such  as  the  body-effect  coefficient  yg). 

Parameter  extraction  is  therefore  extremely  slow,  because  the  distinction  between  /Iq 

and  Tqx  is  only  apparent  near  the  optimum  point. 

Preliminary  extraction  experiments  with  the  Berkeley  model  tend  to  indicate  that 
simple  function-comparison  procedures  (such  as  the  Simplex  algorithm)  converge 
sooner  than  quasi-Newton  routines,  particularly  if  the  starting  point  is  far  away  from  the 
i  optimum.  Other  extraction  efforts  on  the  Berkeley  model  [46]  have  ignored  the 


I 


-  86  • 


/ 


f 

redundant-parameter  problem,  preferring  to  fix  one  of  the  two  parameters  as  a 
constant,  and  using  a  quasi-Newton  algorithm  to  extract  the  other. 

I  5.3;  Incremental  Extraction 

To  adequately  represent  device  behavior  in  all  important  regions  of  operation,  the 
|  parameter  extraction  algorithm  requires  many  input  measurements.  For  each 

measurement,  a  model  prediction  must  be  calculated,  so  the  time  required  to  perform 
an  extraction  is  proportional  to  the  number  of  measurements  supplied.  Unfortunately, 

there  seems  to  be  a  tradeoff  between  extraction  time  and  goodness  of  fit. 

i 

This  difficulty  may  lend  itself  to  a  divide-and-conquer  approach:  an  extraction  is 
performed  on  a  small  subset  of  the  data,  producing  an  optimal  point  in  the  parameter 
space  for  that  subset.  Then  a  full  extraction  is  run  on  the  entire  set  of  measurements, 
{  using  this  solution  as  a  starting  point.  The  improved  starting  point  will  allow  the  full 

extraction  to  use  fewer  iterations  to  compute  the  optimum  parameter  values.  If  several 
iterations  of  the  full  extraction  can  be  eliminated,  the  net  savings  in  extraction  time  will 

I  be  substantial. 

* 

5.4:  Multiple  Models 

Circuit  simulators  have  traditionally  implemented  a  single  model,  with  a  single  set 
of  parameters,  to  represent  all  transistors  that  could  possibly  be  built  with  a  given 
fabrication  technology.  This  restriction  could  be  relaxed,  allowing  model  parameters  to 
"I  be  optimized  for  a  particular  subset  of  all  possible  transistors.  An  obvious  example 

would  be  to  extract  four  sets  of  parameters,  separately  modeling  the  four  different 
size-classes  of  transistors: 
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Width 

Length 

Cjass 
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